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ABSTRACT 


Three  numerical  procedures  are  presented  for  updating  regressions.  All  three 
methods  are  based  on  QR  factorization,  but  after  that  they  use  different  philosophies 
to  update  the  regression  coefficients.  Elden’s  algorithm  updates  using  only  the 
triangular  matrix  R.  This  procedure  does  not  use  orthogonal  transformations,  but  it 
uses  hyperbolic  rotations.  The  modified  Gram-Schmidt  QR  process  is  used  by  Gragg- 
Leveque-Trangenstein’s  method  where  the  matrix  wdth  orthonormal  columns  is  stored 
and  updated.  Chan’s  algorithm  computes  a  column  permutation  n  and  a  QR 
factorization  of  a  matrix  A  such  that  a  rank  deficiency  of  A  wall  be  revealed.  Although 
the  three  methods  are  based  on  different  ideas  and  can  be  used  for  different  purposes 
their  comparison  shows  that  Chan’s  algorithm  is  the  only  accurate  one  in  the  rank 
deficient  case,  and  that  Gragg-Leveque-Trangenstein’s  method  is  the  cheapest  and  the 
most  stable. 
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I. 


INTRODUCTION 


A.  BACKGROUND 

Regression  analysis  provides  a  variety  of  modeling 
techniques  that  allows  the  analyst  to  relate  a  dependent 
variable  to  one  or  more  independent  variables.  The 
mathematical  complexity  of  the  model  and  the  degree  to  which 
it  is  a  realistic  model  will  depend  on  how  much  is  known  about 
the  process  being  studied  and  on  the  purpose  of  the  modeling 
exercise.  In  preliminary  studies  of  a  process  or  in  cases 
where  prediction  is  the  primary  objective,  the  models  will 
frequently  fall  in  the  class  of  models  that  are  linear  in  the 
parameters.  That  is,  the  parameters  enter  the  model  as  simple 
coefficients  of  the  independent  variables  or  functions  of  the 
independent  variables.  Such  models  will  be  referred  to  loosely 
as  linear  models.  In  many  statistical  problems,  it  is  useful 
to  express  the  dependent  variable  as  a  linear  function  of  the 
independent  variables.  Furthermore,  regression  analysis  can 
summarize  data  and  quantify  the  nature  and  strength  of  the 
relationships  among  variables.  Regression  analysis  can  also  be 
used  to  predict  new  values  of  the  dependent  variables  based  on 
observed  relationships. 


B.  ANALYSIS  OF  REGRESSION 

As  the  reader  will  see  in  the  following  chapter,  the  main 
purpose  of  this  thesis  is  to  analyze  the  efficiency  and 
numerical  stability  of  some  procedures  for  updating 
regressions.  Most  regression  problems,  however,  require 
decisions  on  which  variables  to  include  in  the  model,  the  form 
the  variables  should  take,  for  example,  X,  X^,  1/X,  etc.,  and 
the  functional  form  of  the  model.  It  is  assumed  that  there  is 
a  set  of  t  candidate  variables,  which  presumably  includes  all 
relevant  variables,  from  which  a  subset  of  r  variables  is  to 
be  chosen  for  the  regression  equation.  The  candidate  variables 
may  include  different  forms  of  the  same  basic  variable,  such 
as  X  and  X^,  and  the  selection  process  may  include  constraints 
on  which  variables  are  to  be  included.  For  example,  X  may  be 
forced  into  the  model  if  X^  is  in  the  selected  subset.  This  is 
a  common  constraint  in  building  polynomial  models. 

There  are  three  distinct  problem  areas  related  to  this 
general  topic: 

■  The  theoretical  effects  of  variable  selection  on  the 
least  squares  regression  results. 

■  The  computational  methods  for  finding  the  "best"  subset 
of  variables  for  each  subset  size. 

■  The  choice  of  subset  size  (for  the  final  model) ,  or  the 
"stopping  rule". 


2 


This  thesis  mainly  discusses  the  second  problem  area  of 
finding  the  more  efficient  and  numerically  stable 
computational  methods.  Also,  we  generally  discuss  the  criteria 
for  the  "best"  subset  size  choice,  in  other  words  the  criteria 
for  the  "stopping  rules". 

The  simplest  linear  model  involves  only  one  independent 
variable  and  states  that  the  true  mean  of  the  dependent 
variable  changes  at  a  constant  rate  as  the  value  of  the 
independent  variable  increases  or  decreases.  Thus,  the 
functional  relationship  between  the  true  mean  of  Y^,  E(Yj)  ,  and 
Xj  is  the  eguation  of  a  straight  line, 

E(3r,)=Po  +  Pi(^i)  , 

where  is  the  intercept,  or  the  value  of  E(Yj)  when  X  =  0, 
and  is  the  slope  of  the  line,  or  the  rate  of  change  in  E(Yi) 
per  unit  change  in  X.  The  constants  and  are  population 
parameters  which  are  to  be  estimated  from  the  sample  of 
observations . 

The  observations  of  the  dependent  variables,  Y^,  are 
assumed  to  be  random  observations  from  populations  of  random 
variables  with  the  mean  of  each  population  given  by  E(YJ  .  The 
deviation  of  an  observation  from  its  population  mean  E(Yi) 
is  taken  into  account  by  adding  a  random  error,  to  give 
the  statistical  model 
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(1.1) 


The  subscript  i  indicates  the  particular  observation  unit,  i 
=  l,2,...,n.  The  Xj  are  the  n  observations  of  the  independent 
variable  and  are  assumed  to  be  measured  without  error;  that 
is,  the  observed  values  of  X  are  assumed  to  be  a  set  of  known 

« 

constants.  The  and  X^  are  paired  observations;  both  are 
measured  on  every  observational  unit.  The  random  errors  • 

are  assumed  to  be  normally,  independently  and  identically 
distributed : [Ref .  1] 

e^~NID{0,a^)  . 


1.  The  Matrix  Model 

Given  the  above  regression  model,  we  can  write  it  in 
matrix  form  as  Y  =  X6  +  e ,  where  Y  is  the  n  x  1  column  vector 
of  observations  of  the  dependent  variable.  The  n  x  p  matrix  X, 
where  x^g  =  X20  =  ...  =  x,,3  =  1,  will  be  called  the  regression 
matrix,  and  the  Xig ' s  are  generally  chosen  so  that  the  column 
of  X  are  linearly  independent;  that  is  X  has  rank  p.  However, 
in  some  experimental  design  situations  the  elements  of  X  are 
chosen  to  be  0  or  1 ,  and  the  columns  of  X  may  be  linearly 
dependent.  In  this  case  X  is  commonly  called  the  design 
matrix.  Also,  is  the  p  x  1  vector  of  parameters  to  be 
estimated;  and  t  is  the  n  x  1  vector  of  random  errors. 
Writing  out  the  components  of  Y  =  X/5  +  e  yields 
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■*10 

^11 

•*12 

■  •  ■ 

{  0  \ 

Po 

/  \ 

Cl 

^22 

■  *  ‘  ■*2p 

Px 

C2 

. 

= 

. 

. 

•  ■  a  a 

a  a  a  ■ 

■ 

a 

+ 

. 

Y 

\  oj 

^nl 

•  •  •  ^npj 

(nxl)  (nxp')  (p'xl)  (nxl) 


The  elements  of  a  particular  row,  r,  of  X  are  the 
coefficients  of  the  corresponding  parameters  in  p  which  give 
E(YJ  .  The  vectors  Y  and  6  are  random  vectors;  the  elements  of 
these  vectors  are  random  variables.  The  matrix  X  is  considered 
to  be  a  matrix  of  known  constants.  The  vector  ^  is  a  vector  of 
unknown  constants  and  is  to  be  estimated  from  the  given  data. 

Using  the  above  assumptions  on  the  random  vector 
f  has  a  multivariate  normal  distribution  with  a  mean  of  the 
zero  (0)  vector.  The  variance-covariance  matrix  for  any  random 
vector  of  n  elements  is  defined  as  an  n  x  n  symmetric  matrix 
with  the  diagonal  elements  equal  to  the  variance  of  t\  and  the 
off-diagonal  elemients  equal  to  the  covariance  between  and 
f^.  Let  Z  be  a  nxl  vector  of  random  variables  Zj,  Z2,  Z3,  ... 
,z^;  the  variance-covariance  matrix  of  Z  is  the  following 
n  x  n  matrix. 
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Vajr(Z) 


var(z^)  cov(z^,z^) 
cov(z^,z^)  var(z^) 


■  ■  • 


(1.3) 


cov{z^,z^) 

...  cov{z^,z^) 

■  ■  •  •  •  • 

\cov{z^,z^)  cov(z^,z^)  ...  var(z^) 

The  variance-covariance  matrix  of  £  is  la^,  and  since 
£  is  independent  and  identically  distributed,  the  distribution 
of  £  is  written  in  shorthand  notation  as 

£-iyr(o,  Jo^) , 

where  I  is  the  n_x _ n  identity  matrix  and  is  the  common 

variance  of  all  £,.  Furthermore,  since  the  elements  of  X  and 
^  are  constants,  the  Xfi  term  in  the  model  is  a  set  of 
constants  being  added  to  the  vector  of  random  errors,  £.  Thus, 
Y  is  a  random  vector  with  mean  vector  and  variance- 

covariance  matrix  la’: 

£{y)  =f:(js:p+e)  =i:(jrp) +r(e)  =xp  (i.4) 

V^ar(y)  =l^ar(xp+e)  =l^ar(E)  =Jo2  (1.5) 

Var(Y)  is  the  same  as  Var(£)  since  adding  a  constant  to  a 
random  variable  does  not  change  the  variance.  When  £  is 
multivariate  normally  distributed,  Y  is  also  multivariate 
normally  distributed.  Thus, 
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Y~N{X^,Ia^)  (1.6) 

This  result  is  based  on  the  assumption  that  the  linear  model 


being  used  is  the  correct  model  [Ref.  2:. 68]. 


2 .  The  Normal  Equations 

Before  considering  the  problem  of  estimating  we 
note  that  we  are  referring  to  the  model  (2) ,  where  X^o  is  not 
necessarily  constrained  to  be  unity.  In  the  case  when  X^o  f  1 
we  have  to  use  a  notation  in  which  i  runs  froiii  0  to  p-l  rather 
than  1  to  p.  However,  since  the  major  application  of  the 
theory  is  to  the  case  X^o  =1/  it  is  convenient  to  separate  fio 
from  the  other  /3j's  right  from  the  outset. 

The  oldest  method  of  obtaining  an  estimate  of  ^  is  the 
method  of  least  squares.  It  dates  back  at  least  to  Gauss  and 
consists  of  minimizing  with  respect  to  ^  (see  Fig.  1) ; 

that  is,  setting  q  =  X^ ,  we  minimize  =  |1  Y  -  q||^  subject 

to  qe!R[X]  =  W,  where  W  is  the  range  space  of  X  {y:y=Xx  for 
some  X).  If  we  let  q  vary  in  W,  ||  Y  -  q||  ^  (the  square  of  the 
length  of  Y  -  q)  will  be  a  minimum  for  q  =  q*  when  (Y  -  q*)  is 
orthogonal  with  W  (see  Fig.  1) . 


Thus 


X^{Y  -  qn  =  0 


(1.7) 


or 


^  We  denote  the  transpose  of  a  vector  y  by  y’^,  and 
similarly  for  a  matrix  transpose. 
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Figure  1.  The  method  of  least  squares  consists  of  finding 
A  such  that  AB  is  a  minimum. 


X^q*  =  X^Y.  (1-8) 

Here  q"  is  uniquely  determined,  being  the  unique  orthogonal 
projection  of  Y  onto  W.  Now  given  that  the  columns  of  X  are 
linearly  independent,  there  exists  a  unique  vector  (P)  such 
that  q*  =  xP.  Therefore  substituting  in  (1.8)  we  have 

X^X  P  =  X^Y  (1-9) 

the  so-called  normal  equation(s).  At  this  point  we  assume  that 
X  has  rank  p,  so  X^X  is  positive  definite  and  therefore 
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nonsingular.  If  so,  equation  (1.9)  has  a  unique  solution, 
namely, 

p  =  (x^x)  x^y  (1-10) 

Here  0  is  called  the  ordinary  least  squares  estimate  of  0. 
Computational  methods  for  calculating  0  are  given  in  the 
following  chapters.  Notice  that  0  can  also  be  obtained  by 
writing 

e^e  =  (Y-Xfi)^(y-Xfi)  =  Y^y-2p^x^y+p'^x^xfi 

using  the  fact  that  0'^x'^y  =  (^'''x'^'Y)^  =  Y^'X^ ,  and  differentiating 
with  respect  to  0.  Thus  from  de'^e/d^  =  0  we  have 

-2X^Y*2X^X^  =  0 

or 

XTxp  =  x^Y. 

This  solution  for  0  gives  us  a  stationary  value  of  ,  and 
a  simple  algebraic  identity  confirms  that  0  is  a  minimum 
[Ref.  2]. 

The  multiplication  X'^’X  generates  a  p '  x  o'  matrix 
where  the  diagonal  elements  are  the  sums  of  squares  of  each  of 
the  independent  variables  and  the  off-diagonal  elements  are 
the  sums  of  products  between  independent  variables.  The 
general  form  is 
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x^x 


(1.11) 


n 

£^11^12 

•  E^ii-^ip 

E^n 

EVi^ 

£^l2 

£Xi2^ip 

E-»^« 

■  at 

E-^i^r^ip 

a  a  a  a  a  a 

•  •  ■  E-^i  , 

Summation  in  all  cases  is  over  i  =  1  to  n,  the  n  observations 
in  the  data.  When  only  one  independent  variable  is  involved, 
X'^X  consists  of  only  the  upper-left  2x2  matrix.  Inspection 
of  the  normal  equations  will  reveal  that  the  elements  in  this 
2x2  matrix  are  the  coefficients  of  (5^  and  0^^. 

The  elements  of  the  matrix  product  X^Y  are  the  sums  of 
products  between  each  independent  variable  in  turn  and  the 
dependent  variable; 


(1.12) 


The  first  element,  TVi,  is  the  sum  of  products  between  the 
vectors  of  ones  (the  first  column  of  X)  and  Y.  As  is  mentioned 
above,  if  only  one  independent  variable  is  involved,  X^Y 
consists  of  only  the  first  two  elements. 
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The  fitted  regression 


Y  =  X^, 


is  denoted  by 


Y  (  =  [(?i>])  ,  and  the  elements  of 

e  =  Y-Y  =  Y-X^  =  (I^-X(X'^X)-^  X'^)Y  =  {I^-P)Y,  say, 

are  called  the  residuals .  The  minimum  value  of  e^e,  namely, 

e'^e  = 

=  Y^Y-2^^X^Y+^'^X^XP 


=  Y^Y-^^X^Y+^^[X^X^-X^Y] 


=  Y^Y-^^x^Y  =  r^y-p’jr*'jrp, 


(1.13) 


is  called  the  residual  sum  of  squares  (RSS) .  Notice  that  Y  and 
e  are  uniquely  defined  by  0. 

C.  VARIABLE  SELECTION  IN  LEAST  SQUARES  ANALYSIS 

The  purpose  of  the  least  squares  analysis  will  influence 
the  manner  in  which  the  model  is  constructed.  Hocking  [Ref.  4] 
relates  six  potential  uses  of  regression  equations  given  by 
Mallows  [Ref.  3 ] : 

■  Providing  a  good  description  of  the  behavior  of  the 
response  variable 

■  Prediction  of  future  responses  and  estimation  of  mean 
responses 
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■  Extrapolation,  or  prediction  of  responses  outside  the 
range  of  the  data 

■  Estimation  of  parameters 

■  Control  of  a  process  by  varying  levels  of  input 

■  Developing  realistic  models  of  the  process. 

Assume  that  the  correct  model  involves  t  independent 
variables  but  that  a  subset  of  p  variables,  chosen  randomly  or 
on  the  basis  of  external  information,  is  used  in  the 
regression  equation.  Let  Xp  and  denote  submatrices  of  X  and 

that  relate  to  the  p  selected  variables.  (5^  will  denote  the 

least  squares  estimates  of  obtained  from  the  p-variate 
subset  model  and  MS(ReSp)  the  mean  squares  residual  obtained 
from  the  p-variate  subset  model.  After  the  above  the  following 
properties  are  summarized: 

■  MS(ReSp)  is  a  positively  biased  estimate  of  unless  the 
true  regression  coefficients  for  all  deleted  variables  are 
zero . 

■  Pp  is  a  biased  estimate  of  and  less  variable  than  the 

corresponding  statistics  obtained  from  the  t-variate  model 
[Ref.  4]. 

Stepwise  regression  methods  are  variable  selection  methods 
which  identify  good  (although  not  necessarily  the  best)  subset 
models,  with  considerably  less  computing  than  required  for  all 
possible  regressions.  The  subset  models  are  identified 
sequentially  by  adding  or  deleting,  depending  on  the  method. 
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the  one  variable  that  has  the  greatest  impact  on  the  residual 
sum  of  squares.  These  stepwise  methods  are  not  guaranteed  to 
find  the  "best"  subset  for  each  subset  size,  and  the  results 
produced  by  different  methods  may  not  agree  with  each  other. 

D.  RESEARCH  METHODOLOGY 

Given  the  regression  model  Y  =  +  e ,  where  X  is  n  x  p . 

a  number  of  computational  techniques  have  being  suggested  for 
solving  the  following  steps: 

■  Solve  the  normal  equations  X'^X  $  =  X'^Y , 

■  Calculate  the  residual  0  =  Y-J^  . 

■  Calculate  the  residual  sum  of  squares  RSS  =  e^e. 

■  Update  the  regression  model  (that  is,  add  or  remove  a 
row  of  X) . 

■  Add  or  remove  a  regressor  (that  is,  add  or  remove  a 
column  of  X. ) 

■  Calculate  an  F-statistic  for  a  general  linear 
hypothesis . 

Solving  the  above  steps  is  a  common  problem  in  a  computer 
laboratory.  These  problems  arise  in  a  variety  of  areas  and  in 
a  variety  of  contexts.  Linear  least  squares  problems  are 
particularly  difficult  to  solve  because  they  frequently 
involve  large  quantities  of  data,  and  they  are  frequently  ill- 
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conditioned^-  The  research  methodology  will  be  to  describe 
acceptable  procedures  for  updating  regressions.  The  procedures 
to  be  chosen  should  be  economically  attractive  and  numerically 
accurate,  in  the  case  that  the  number  of  observations  in  the 
regression  is  large  compared  to  the  number  of  variables  in  the 
regression  model.  Each  of  these  procedures  (algorithms)  has 
advantages  and  disadvantages  which  will  be  explained  below. 


E.  THESIS  ORGANIZATION 
1.  General 

The  main  purpose  of  this  thesis  is  to  bring  to  the 
attention  of  readers  numerically  acceptable  procedures  for 
adding  and  deleting  rows  and  columns  from  a  regression  model. 
For  the  case  of  a  single  row  to  be  inserted  or  deleted,  the 
algorithms  use  simple  techniques:  plane  or  hyperbolic 
rotations  and  the  Gram-Schmidt  process.  The  algorithms  for 
inserting  rows  are  stable,  but  the  problem  of  deleting  a  row 
may  be  ill-conditioned  and  some  algorithms  for  this  process 
may  be  numerically  unstable. 

The  need  for  updating  regression  results  arises  for 
various  statistical  or  numerical  reasons.  When  data  are 

^A  set  of  linear  equations  Bx  =  c  is  said  to  be  ill- 
conditioned  if  small  errors  or  variations  in  the  elements  of 
B  and  c  have  a  large  effect  on  the  solution  x. 


arriving  sequentially,  it  may  be  undesirable  or  impossible  to 
wait  for  all  the  data  before  obtaining  some  regression 
results.  In  various  time-series  problems,  one  is  interested  in 
the  changing  relationships  between  variables.  A  regression 
model  with  a  fixed  number  of  lagged  terms,  as  it  moves  over  a 
series  of  data,  generates  a  "window"  on  the  sample,  with  a  new 
observation  added,  and  an  old  one  deleted^,  as  the  window 
moves  to  the  next  point  in  the  series  [Ref.  5]. 

2.  Basic  Schedule 

The  procedures  for  updating  regressions  that  we  shall 
discuss  here  are  the  following; 

■  An  algorithm  based  on  the  normal  equations 

(Efroymson  M.  A.,  1960[Ref.  6],  Draper  N.  and  Smith 

H.  ,  1966[Ref.  7]);  this  had  been  the  standard 

introductory  approach  in  regression  courses.  In  this  algorithm 
the  regression  coefficients  are  solved  using  Gauss-Jordan 
elimination  on  the  normal  equations.  However,  the  problem  of 
solving  the  normal  equations  is  frequently  much  more  ill- 
conditioned  than  the  original  problem  of  solving  the 
overdetermined  linear  equations. 

■  Stepwise  regression  analysis  with  orthogonal 
transformations  (Lars  Elden  1972) [Ref.  8].  In  this 

^But  in  this  case  the  problem  can  be  ill-conditioned, 
because  the  rank  can  be  decreased  by  this  operation. 


algorithm  a  method  is  presented  using  QR-decomposition .  In  the 
QR-decomposition  or  factorization,  we  express  a  matrix  as  a 
product  of  an  orthogonal  matrix  Q  and  an  upper  trapezoidal 
matrix  R  (actually  R  stands  for  right  trapezoidal) .  Many  of 
the  ideas  used  in  this  method  have  been  proposed  by  Golub 
(1965)  [Ref.  9]. 

■  The  modified  Gram-Schmidt  triangular  factorization 
(W.  Gragg,  R.  Leveque,  J.  Trangenstein  1978) 
[Ref.  10].  This  approach  is  A  =  QR  factorization  with 
Q  having  orthonormal  columns  and  R  upper  triangular.  In  this 
algorithm  one  is  able  to  add  or  drop  regressors  (columns  of  A) 
or  observations  (rows  of  A)  using  essentially  only  the  storage 
needed  for  A  and  to  secure  numerical  accuracy  possibly  at  the 
expense  of  additional  computation  and  storage. 

■  Rank  revealing  QR-f actorization  (T.  Chan,  Hansen 
1986) .  In  this  algorithm  a  method  is  presented  for  computing 
a  column  permutation,  D'',  and  a  QR  factorization.  All  =  QR,  of 
an  n  by  m  (n  >  m)  matrix  A  such  that  a  possible  rank 
deficiency^  of  A  will  be  revealed  in  the  triangular  factor  R 
having  a  small  lower  right  block.  Notice  that  a  matrix  A  is 
rank  deficient  with  rank  deficiency  d  if  it  has  at  least  d 


'’ll  is  a  permutation  matrix  where  and  is  the  product 
of  PiP2 .  .  .  Pr,_i .  Notice  that  Pj  denotes  the  matrix  representation 
of  the  column  interchange  that  precedes  step  i. 

^This  means  that  the  columns  of  the  observation  matrix  A 
are  not  linearly  independent. 
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singular  values.  For  matrices  of  low  rank  deficiency,  the 
algorithm  can  reveal  the  rank  of  A,  and  the  cost  is  only 
slightly  more  than  the  cost  of  one  regular  QR  factorization. 
A  posteriori  upper  and  lower  bounds  on  the  singular  values® 
of  A  can  be  used  to  infer  the  numerical  rank  of  A. 

In  Chapter  II  we  discuss  the  general  theory  about 
updating  regressions.  A  process  for  adding  and  dropping 
regressors  follows.  Techniques  are  presented  with  stepwise 
regression  in  mind,  and  we  discuss  how  to  compute  the  various 
quantities  of  statistical  interest  using  the  algorithms. 

Chapter  III  mainly  covers  computational  details  of  the 
algorithms,  which  are  used  to  compute  the  regression 
coefficients.  In  this  chapter  also  a  simulation  is  applied  to 
the  basic  algorithms  for  validating  the  models  by  creating 
useful  numerical  results.  In  the  same  chapter  we  discuss  the 
measures  of  effectiveness  of  each  model  based  on  the 
previously  generated  numerical  results.  Chapter  IV  concludes 
the  thesis  and  offers  recommendations. 


®The  singular  values  of  a  matrix  A  are  the  "diagonal 
elements"  of  S,  one  of  the  three  component  matrices  that  A 
splitted  to  avoid  singularity. 
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II. 


BACKGROUND  THEORY 


A.  GENERAL  THEORY  FOR  FITTING  A  SPECIFIED  REGRESSION 

Several  techniques  can  be  used  to  solve  the  problem  of 
updating  regressions.  Efroymson's  algorithm  [Ref.  6]  is  one  of 
the  earlier  ones  used.  It  uses  the  method  of  Gauss-Jordan 
elimination  on  the  normal  equations  A^Ax  =  A^b.  A  more 
efficient  algorithm  is  to  use  the  Cholesky  factorization  of 
the  Gram  matrix  A'^^A  into  R^R  where  R  is  upper  triangular.  The 
solution  to  the  original  system  is  then  found  by  a  two  step 
triangular  solve  process: 

R  =  A  ^Jb,  Rx  =  y  =•  A  ^Ax  =  R'^y  =  A^b. 

Another  way  to  solve  the  above  problem  utilizes  orthogonal 
transformations.  This  approach,  called  QR  factorization,  is 
the  basis  of  the  thesis  algorithms  and  may  be  slightly  slower 
than  the  normal  equation  approach,  but  is  more  stable 
numerically.  The  QR  factorization  of  a  matrix  can  be  computed 
using  the  following  methods; 

■  Classical  Gram-Schmidt  (CGM) . 

■  Modified  Gram-Schmidt  (MGS) . 

■  Givens  rotations. 

■  Householder  reflection  (Elden's  algorithm). 
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■  QR  with  Column  Pivoting  for  the  Rank  Deficiency  case 
(Chan's  algorithm). 

■  Daniel-Gragg-Kaufman-Stewart  method  (Gragg-Leveque- 
Trangenstein ' s  algorithm). 

The  latter  algorithm  uses  rotations  and  the  Gram-Schmidt 
process  with  reorthogonal ization . 

Applying  the  QR  technique  on  a  square  nonsingular  system 
Ax  =  b  we  have  (QR)X  =  b  and  by  the  associative  law  of  matrix 
multiplication  Q(Rx)  =  b.  Defining  y  =  Rx,  it  follows  that  Qy 
=  b.  Hence,  Ax  =  b  is  solved  in  two  steps: 

1.  Solve  Qy  =  b  for  the  unknown  y. 

2.  Solve  Rx  =  y  for  the  unknown  x. 

The  second  system  is  solved  by  back  substitution.  The  first 
system  is  easy  to  solve  since  Q  is  orthogonal.  Multiplying  Qy 
=  b  by  Q^  yields  q‘’'q  =  Q'^b.  Since  Q  is  orthogonal,  Q^Q  =  I  and 
y  =  Q’‘'b.  When  A  has  more  rows  than  columns  and  is  of  full  rank 
the  above  process  yields  the  solution  of  the  least  squares 
problem  |1  b  -  Axll2^  =  minimum.  The  key  to  the  numerical 
stability  is  the  orthonormality  of  the  columns  of  Q.  To  avoid 
loss  of  orthogonality  Gragg-Leveque-Trangenstein ' s  algorithm 
applies  reorthogonal  ization  whenever  ||u||/||u!|  is  small  (say, 
less  than  72/2)  ,  where  u  and  u  is  the  vector  to  be 
orthogonal ized  and  its  orthogonal  projection,  respectively. 

Finally  if  A  is  rank  deficient  then  the  QR  factorization 
need  not  give  a  basis  for  range(A).  This  problem  can  be 
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corrected,  as  mentioned  below,  by  computing  the  QR 
factorization  of  the  column  pivoted  version  of  A,  AH  =  QR 
where  II  is  a  permutation,  and  applying  Chan's  rank  revealing 
scheme . 

In  the  least  square  problem,  statisticians  and  numerical 
analysts  use  different  notations  for  the  same  entities,  i.e., 
the  matrix  of  the  observations  of  independent  variables,  the 
vector  of  the  dependent  variables,  the  vector  of  random 
errors,  etc.  To  avoid  misunderstandings  caused  by  using  both 
notations  in  the  following  chapters,  we  built  the  following 
TABLE  I  of  notational  correspondents. 

Most  numerical  analysts  use  n  to  represent  columns  of  a 
matrix  and  m  to  represent  rows.  0t>''  :  interchanges  m  and  n, 
and  we  will  use  this  notatJon. 
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TABLE  I 


NOTATIONS  USED  IN  REGRESSION  ANALYSIS 


DESCRIPTION 

STATISTITIANS 

N,  ANALYSTS 

Independent  var.  matrix 

X 

A 

Dependent  var.  col.  vector 

Y 

b 

Vector  of  parameters  to  be 

P 

X 

estimated 

Rows  of  indep.  var.  matrix 

n 

n 

Cols  of  indep.  var.  matrix 

P 

m 

Vector  of  random  errors 

e 

r 

Unique  solution  to  the 

P 

X 

normal  equation 

The  vector  of  estimated 

Y 

means  of  the  dep.  var. 

The  idemootent  n  x  n  matrix 

P 

QQ^ 

The  residuals  vector 

e 

r 

Residual  sum  of  squares 

SS(Res) 

r^r 

Siqnific.  level  enter/stay 

pa/b 

V 

^The  numerical  analysts  describe  the  vector  p  =  Ax  =  b-r 
as  the  "orthoprojector  onto  15(A)". 


B.  METHODS  FOR  UPDATING  REGRESSIONS 

Before  starting  this  section,  let  us  introduce  some 
terminology.  The  upper  case  letters  denote  matrices,  lower 
case  letters  denote  vectors,  and  Greek  letters  denote  scalars. 
Thus  we  write 


following  the  notational  correspondents  of  TABLE  I.  For 
notational  convenience  we  set  x  =  p  =  b  and  the  normal 
equations  can  now  be  written  Xb  =  v  or  Ax  =  b. 


1.  Lars  Elden  (1960) 

In  Elden 's  algorithm  the  upper  triangular  matrix  R  in 
the  QR-decomposition  of  A  (in  the  model  Ax  =  b)  is  determined 
by  successive  Householder  transformations  so  that  after  k 
steps  A  has  the  form 

yC*)  \| 

=  I  0  ^k*i  ■  n-k 

ik)  (1)  {n-k-D 

r'^’  is  upper  triangular  and  T'^’  is  k  x  fm  -  k)  .  Furthermore, 

,  where  with  1  is  now  chosen  so 
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that  k  first  rows  are  left  unchanged  and 


•  jt+i 


'  0  \ 

\^k*i 


=  Xej^,  where  |X|  =  la**!  | 


After  m  steps  X  will  be  completely  triangularized : 


P i^P .  P jA  — 


In  stepwise  regression  analysis,  however,  the  final 
result  may  be  of  the  form 


r.  - 


Q^A 


2j  (pJ  2* 

0  A  j ' 


(2.2) 


Where  is  an  upper  triangular  p  x  o  matrix.  The 

corresponding  right  hand  side  and  the  residual  vector  are: 


c>*y  = 


and  r  = 


The  regression  coefficients  are  calculated  from  the 
system  R^P’b  =  c^p’.  In  most  cases  the  column  vectors  of  X  are 
not  added  to  the  subspace  in  the  natural  order,  so  that  only 
after  a  permutation  of  columns  is  q'^'a  of  the  form  shown  in 
equation  (2.2)  . 


a.  Choice  of  Vector  for  Enlarging  the  Subspace. 

We  shall  add  to  the  subspace  the  vector  that  will 
make  the  norm  of  the  residual  vector  decrease  as  much  as 
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possible.  Since  only  the  m  -  k  last  rows  of  the  matrix  are 
changed  in  the  kth  step  we  can  without  loss  of  generality 
restrict  ourselves  to  considering  the  first  step  of  the 
analysis.  We  have  the  linear  system  of  equations  Ax  =  b,  with 
A  and  b  as  in  (2.1).  Let  aj  =  (a^j,  a2j,  ...,  Onj)’''  denote  the 
jth  column  vector  of  A.  We  shall  multiply  (A  :  b)  by  an 
orthogonal  matrix  P  =  I  -  2ww^,  where  ||w||2®  =  1,  so  that  for 
some  j  we  have  Pa^  =  ae^,  where  |A|  =  ||aj||2-  After  this 

transformation  the  residual  vector  is  given  by  the  last  n  -  1 
components  of  Pb  and  since  we  were  to  decrease  the  norm  of  the 
residual  vector  as  much  as  possible,  j  shall  be  chosen  so  that 
the  first  component  (Pb)i  of  Pb  will  be  as  large  as  possible. 
Now  we  have  (Pb)i  =  (Pb)'^ei  =  (Pb)^Xei/X  =  b^P^Paj/X  =  b'^'a^/X. 
Thus  we  shall  select  j  so  that  the  quantity 


(b^aj) 

Ujl 


will  be  as  large  as  possible  [Ref.  9]. 


b.  Choice  of  Vector  for  Diminishing  the  Subspace. 
Suppose  that  after  a  number  of  steps  (A  •:  jb)  has  the 


form 


®The  Euclidean  length  of  a  vector  is  often  denoted  ||x|l2, 
also  called  the  2-norm,  since  the  components  of  x  are  raised 
to  the  second  power. 


||x||2  = 
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(*)\ 


(2.3) 


(*)  c 

0 

where  is  an  upper  triangular  fk  x  matrix.  Let  rp  =  (Pip, 

/32p/  •••  »Pppf  0/  •••  /  0)^  denote  the  pth  column  vector  of  Rj^. 
Now  if  the  ith  column  vector  of  A  is  removed  we  can  by 
successive  rotations  in  the  (i.  i-H)  .  M  +  1.  i  +  2 K  .  .  .  .  fk-1 ,  k) 
planes  transform  Rj^  to  triangular  form  apart  from  the  ith 
column.  That  is,  we  compute  the  coordinates  of  the  remaining 
column  vectors  of  A  in  the  orthonormal  basis  where  the  j.th 
base  vector  has  been  excluded. 

(6.) 

Let  denote  the  orthogonal  rotation  matrix  in 

the  (i .  i+1^  plane  which  deviates  from  the  unit  matrix  only  in 
the  elements  qj  j  =  qj+i,j+i  =  cos9j  and  qj.j+i  =  =  sinGj.  If 

0j  is  chosen  so  that 

cose,  = - —  ,  sine  = - - 

J  I  2  5  J  r— 5  5 

we  get,  Oj. ~  Oj.j*x^Pi.j*if  •  •  •Pj,j*x'Pj*i,j*x'^'  '  •  ~ 

=  (Px,jt-\i  ■  •  •fPj-i,j*i>Pj,d*if^'^'  ’  •  where 

=  \/ p].J*i*P^j*x,j*i  • 


Further  we  have 


(6^)  _  ,  f  /  n  n\ 
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where  P'j.j  =  cos9j  p^  j  and  p'j+i,j  =  -sinGj  Pj,j-  After  k  -  i  -  1 
rotation  we  have  transformed  to  the  form 

(Pii  ■  •  •  Pi.i-i  Pi,^  •  •  •  Pi,*  ^ 


(JO 


Pj-i,i-i  Pj-i,j  Pj-i.j*i 


P'j.i 


/ 

Pk.j 


P:/,i*i 

0 


B  ■  ■ 


Pi  1.* 
Pi,* 


P*-i,* 

0 


where  —  Q]c-x,k  #  •  •  •  » l?i,i*i  • 

The  preceding  can  be  illustrated  by  the  following  schematic 
example  (k  =  5,  j  =  3) 


'  X  X  X  X  X 

X  X  X  X  X 

X  X  X  X  X 

X  X  X  X 

rotation 
on  (3,4) 

X  X  X  X 

rotation 
on  (4,5) 

X  X  X  X 

XXX 

X  X 

K  X. 

XXX 

+  □  X 

.  X, 

XXX 

X  X 

^  +  □; 

where  x  =  any  nonzero  element,  +  =  nonzero  element  being 
introduced,  □  =  element  being  annihilated. 

To  update  the  transposed  inverse  =  X‘  we 

multiply  by  the  same  rotation  matrices  from  the  left  as  in  the 
following  schematic  example. 
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since  can  be  transformed  to  triangular  form  by  permuting 

its  columns,  it  is  easy  to  see  that  (Rq‘‘^  )  ^  will  be  triangular 


if  the  same  permutation  is  performed  on  its  rows.  Therefore 
the  jth  column  of  has  only  one  nonzero 
element.  Let  x,  =  (0,  ...,  0,  denote  the  jth 
column  vector  of  Then  Qj^Xj  =  Xe;,.  Since  ||Xj|l2  =  llQkXjll 
we  have  |Xl  =  ilx^ll2.  The  right  hand  side  of  (2.3)  is  multiplied 
by  Q,.  and  as  the  dimension  of  the  subspace  is  decreased  the 
increase  of  the  norm  of  the  residual  vector  will  come  from  the 
last  component  of  But 


WOk 


X 


Thus  in  each  step  we  have  to  find  for  which  value  of  j 

V  =  — 1 - 

is  minimal,  and  if  for  this  j  the  increase  of  the  norm  of  the 
residual  vector  is  not  significant  the  jth  column  vector  of  X 
is  removed  from  the  subspace  (the  variable  Xj  is  deleted  from 
the  regression) . 
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Inserting  a  Row  (Observation 


In  some  applications  it  may  be  desirable  to  insert 
a  row  to  the  matrix  of  observations  after  the  analysis  is 
finished  and  study  what  influence  that  row  can  have  on  the 
regression  coefficients.  This  can  be  done  without  having  to 
start  from  the  beginning  again  using  the  QR-f actorization  that 
has  already  been  done.  The  procedure  can  be  illustrated  by  the 
following  somewhat  simplified  example.  Suppose  we  have  the 
system  of  equations  Rx  =  c,  where 


The  system  is  augmented  by  a  row  (an  observation) : 


If  we  multiply  [R,-,  ^  <^0]  by  a  rotation  matrix  (Gj  in  the 

(1,  n+1)  plane  where  0,  is  chosen  so  that 
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cos  0^  =  Pii  /  /  sin 

we  obtain 


(1) 

^(1)  \ 

pii 

Pl2  ■  • 

■  Pin 

P22  ■  ■ 

•  P2n 

O  (1)  _  P  - 

-^0  ”  wi,n*l 

• 

■ 

Piin 

(1) 

-  (1) 

.  0 

Soil,  2  •  • 

•  Sn+l,n/ 

where 

piti  =  cos  0,  *  sin  0^  j=1.  .  .  .  ,n 

=  -sin  0j^  Pj^^j  +  cos  0j^  ^n*i,j  ^  =  2,  .  .  .  ,n. 

After  n  successive  rotations  in  the  (1,  n+1) ,  (2,  n+1) ,  .. 

(n,  n+1)  planes 


Pll  P12 

^(1)\ 

•  •  •  Plo 

P22 

•  .  •  P20 

-  On.n^l  <®n>  '  *  ' 

•  • 

■  • 

Pnn 

,  0  0 

•  •  ■  0  j 

C;  has  been  transformed: 

Co”’  =  .  /  Oi 

^(n)  w  (d) 

■  •  t  /  ^n*l 

Now  we  have  the  system 

Rg""'’  X  =  (c/^\  02^^^ 

r-  (n)  \  T 
•  •  *  »  '-'n  1 
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(the  component 


belongs  to  the  residual  vector) . 


d.  Removing  a  Row  (Observation) . 

Sometimes  it  may  be  desirable  to  remove  a  row  of 
the  matrix  of  observations  after  the  solution  has  been 
obtained.  This  can  be  done  in  a  very  simple  manner  without  the 
analysis  having  to  be  done  from  the  beginning  again.  Suppose 
that  the  matrix  A  of  observation  has  been  decomposed: 


Ql  A 


Ql  b 


A 


(2.4) 


Let  (u  =  1,  .  .  .  ,m)  denote  the  row  vectors  of  X.  Then  the 
normal  equat  .  3  of  the  system  Ax  =  b  can  be  written  in  the 
following  ■'.^rm: 

V*1  V=1 


If  ap  is  the  row  of  A  that  is  going  to  be  removed,  let 


/  J?  ' 


where  i*  =  -1 . 


Then  S^  S  =  R‘  R  -  a^'  a,,  =  R'^  Qa  R  "  Sp  = 


=  A-  A  -  ap-  ap  = 


r 

a,  a,. 
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It  is  easy  to  see  that  if  the  vector  c  in  (2.4)  is  augmented 
by  a  component  ib^,  the  corresponding  right  hand  side  of  the 
normal  equation  will  be 


E 


v-l,  \*p 


S  can  be  brought  to  triangular  form  by  successive 
multiplications  by  matrices  T.  T2^n*n  •  •  •  n+i  where  is 

not  unitary  but  complex  orthogonal  (Tj,  ^+1^  =  I)  •  Consider  the 
first  step: 


*11 


(2.5) 


Then 


3  = 


P'll  P'l2 


0  • 


•  t 


where 


31 


where  we  denote  with  double  prime  (")  the  final  components  of 
the  normal  equations.  Then  we  have  the  system  R"z  =  c"  to 

solve.  Notice  that  the  component  Ibp  belongs  to  the  residual 

vector.  The  hyperbolic  rotations  that  are  used  to  update  the 
regression  coefficient  by  working  only  on  the  matrix  R  are  not 
unitary.  Non-unitarv  transformations  can  destroy  numerical 
stability . 


2.  Gragg-Leveque-Trangenstein  (1978) 

These  algorithms  are  based  on  the  use  of  (orthogonal) 
plane  rotations  and  the  modified  Gram-Schmidt  process  with 
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reorthogonalization .  We  compute  the  following  quantities  of 
interest  in  a  regression  analysis,  with  stepwise  regression: 

■  The  least  squares  coeificients  b, 

■  The  residuals  r  =  y  -  Xb, 

■  The  ANOVA  tabled 

■  The  covariance  matrix  for  the  regression 
coefficients,  and 

■  Partial  F-tests  for  the  significance  of  a  given 
variable  in  the  regression. 


Let  us  have  entered  k  -  1  columns  of  X  in  the 


model  y  =  Xb,  and  we  wish  to  add  the  next  column.  This  means 
that  we  want  to  find  a  k  x  k  matrix  whose  columns  form  an 
orthonormal  basis  for  the  span  of  the  k  columns  of  X,  assuming 
that  is  known.  The  remaining  columns  of  X  are  regressed 

upon  the  first  k  columns.  Let  be  a  k  x  k  triangular  matrix 
which  maps  the  orthonormal  basis  into  the  original  basis.  Also 
is  the  orthogonal  projection  of  the  vector  y  of  dependent 
variable  onto  the  space  orthogonal  to  the  space  of  the  first 
k  columns.  In  other  words,  is  the  residual  vector. 

Mathematically,  with  k  -  1  columns  entered,  we  have  factored: 

®Here,  the  residual  sum  of  squares  is  computed  directly, 
and  not  by  subtracting  the  sum  of  squares  due  to  regression 
from  the  total  sum  of  squares. 


( 

] 

/ 

\ 

•^*-1  ^*-1  ^k-X 

X  y 

= 

Ojc-x  ^k-X  ^k-X 

0  J  0 

[ 

H 

O 

o 

where  Qk-i  is  n  X  'k-1 ^  and 

Qk-JQk-i  =  I;  (2.7) 

also  the  columns  of  and  are  orthogonal  to  the  columns 
of  Qk-i.  This  means  that 

Qk-X  =0  '  Qk-x  ^^k-x  =0 ;  ( 2  -  8 ) 

Also  is  (k-1)  X  (k-l)  upper  triangular;  and 

Cjt.j =(?*_!  *y.  (2.9) 

The  above  procedure  was  a  step  of  the  Gram-Schmidt 
process  that  can  be  viewed  as  a  partial  QR  factorization. 
Although  there  are  some  similarities  between  the  Gram-Schmidt 
factorization  and  Householder's  factorization,  there  are  also 
some  important  differences.  We  are  going  to  discuss  that  in 
the  next  chapter  during  the  comparison  process. 

To  update  the  above  factorization  (i.e.,  entering 
the  kth  column  or,  in  other  words,  to  add  the  kth  variable) , 
we  perform  the  following  steps; 

■  Repartition; 
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(2.10) 


( 

] 

/ 

X  y 

[ 

J 

Pk-l  ^k-1 


^k-1  ^k  ^Jk-l 

0®'  1  0®'  0 

0  0  J  0 

0®’  0  0*"  1 


Normalization,  and  the  equation  (2.10)  becomes: 


( 

Oir  1  — -A*  i"*-! 

IN 

0^  IN 
0  0 

0®*  0 

J  0 

1 

o 

o 

H 

o 

Orthogonal izat ion. 

with 

(qk  = 

obtain : 


^k  A*-g*(g**'A*) 


^k  ^k  ^k-1 

O’*  INI  qk^r^-i 

0  0  1  0 

V  0®’  0  o®*  1 


Relabel,  so  that  the  equation  (2.10)  can  be 


written  as; 


¥*»  r,  c»'i 


Ok  Ajc  ■Tjt 


0  J  0 
l.O®'  O®"  1 


The  above  mathematical  statements  complete  the 
insertion  of  the  kth  regressor.  Furthermore  the  work  required 
is  essentially  2n(p  -  k)  multiplications  and  additions.  The 
storage  space  required  and  the  part  of  storage  affected  by  the 
algorithm  is  illustrated  by  these  partitioned  matrices. 


35 


Suppose  now  that  we  have  to  enter  the  kth  variable 
(column)  into  the  model.  In  this  case  the  appropriate  column 
to  be  added  is  the  one  which  minimizes  the  norm  of  the 
residuals.  We  have  that  =  0  and  from  the 

orthogonalization  and  relabel  steps,  that 

By  the  Pythagorean  theorem, 

F*-if  -  F/  =  (2-11) 

so  when  we  add  the  kth  variable  to  the  model  the  change  in  the 
residual  is  (qk^r^-i)^.  Recalling  that  =  a^VllakH,  the  change 
can  be  expressed  as  (qk^rk_i)2  =  (ak^rk-i/||  3^11  ) ^.  So  the  chosen 
column  to  add  at  the  kth  stage  is  a  column  aj,  k<j<p,  for 
which  I  aj^r^-i  I /II  ajll  is  maximal.  A  test  for  significance  should 
be  satisfied  for  the  above  column  to  be  entered  into  the 
regression.  The  work  for  choosing  the  column  to  add  is  on  the 
order  of  2nfp  -  k)  multiplications  and  additions. 

To  compute  the  regression  coefficients  we  have  to 
follow  the  next  steps.  A  n  x  k  matrix  is  formed  by  the 
first  k  columns  of  X.  It  has  been  factored  Xk  =  QkRk,  where  Qk 
is  n  X  k  with  orthonormal  columns,  and  is  upper  triangular. 
The  projection  of  y  onto  the  range  of  X  can  be  written  as: 

y-^jt  =  OjtCt  =  xbj,  = 

So  the  regression  coefficient  can  be  computed  easily  by 
solving  the  following  triangular  system 
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^iPk  ^k> 

requiring  an  order  of  0 . 5k^  multiplications  and  additions. 

The  next  step  is  the  computation  of  the  total  sum 
of  squares  ||y||^.  A  smart  mathematical  computation  is  taking 
place  here.  After  computing  the  sum  of  squares  due  to  the 
regression  at  the  kth  stage  of  the  regression,  as: 

IM’  =  =  1=4’- 

since  c^  is  a  k-vector,  the  residual  sum  of  squares  is 
computed  directly  as  Hr^p,  rather  than  by  subtraction  which 
would  mean  a  loss  of  accuracy  due  to  cancellation.  This  is 
recommended,  especially  if  the  mean  residual  sum  of  squares  is 
to  be  used  in  sensitive  tests  for  significance  levels.  The 
common  criterion  for  terminating  the  selection  process  is  the 
ratio  of  the  reduction  in  residual  sum  of  squares  caused  by 
the  next  candidate  variable  to  be  considered  to  the  residual 
mean  square  from  the  model  including  that  variable.  This 
criterion  can  be  expressed  in  terms  of  a  critical  "F-to-enter" 
or  in  terms  of  a  critical  "significance  level  to  enter"  (SLE) , 
where  F  is  the  "F-test"  of  the  partial  sum  of  squares  of  the 
variable  being  considered.  We  can  compute  the  "F-to-enter", 
Fe,  for  the  kth  variable  by  using  the  residual  sum  of  squares 
before  and  after  the  insertion  as  in  (2.6).  Let  RSS^.^  = 
(r^-i)^(rij_i)  and  RSSj,  =  (r^)^(r,j)  be  respectively  the  residual 
sum  of  scpjares  after  each  variable  addition.  Also  let  (n  -  m) 
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be  the  degrees  of  freedom  for  the  last  model,  where  n  and  m 
are  the  numbers  of  rows  and  the  columns  of  the  current  matrix 


X  of  the  independent  variables.  After  the  above,  the  "F-to- 
enter"  is 

Fe  = 

RSSj^/  (n-jn) 

To  compute  the  covariance  matrix  of  the  regression 
coefficients  using  the  normal  equations,  usually  is 

computed.  For  avoiding  this  numerically  unstable  process,  a 
new  procedure  is  followed  in  this  algorithm.  Using  the  above 
orthogonal  decomposition,  it  is  found  that: 


where  The  approach  of  the  algorithm  is 

to  solve  Rk^Vk  =  I,  and  compute  (Xk^X^)*^  as  Vk’^Vk.  Hence,  Vk  is 
updated  as  follows: 


0  ^ 

«*ll 


r^^V, 


0 

1 


IM  * 


Thus,  Vk  differs  from  Vk_i  only  in  the  bordering  of  a  new  row 
and  column.  The  above  updating  of  Vk  is  mathematically 
equivalent  to  forwardsolving  Rk’Vk  =  I  directly.  Notice  that 
the  computation  of  variances  (i.e.,  the  diagonal  of  the 
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covariance  matrix)  of  the  requires  only  on  the  order  of  k- 
multiplications  and  additions.  The  variances  of  bj.  are  needed 
for  significance  tests  (i.e.,  t-test  and  F-test) . 

Having  entered  k  variables  (columns)  into  the 
regression,  we  have  the  following  factorization: 


i 

X  y 

= 

Ok  A*  ^k 

0  J  0 

V 

1  / 

,0’'  0*'  1, 

(k)  (p-k)  (1) 

where  is  n  x  k  with  orthonormal  columns,  is  k  x  k  upper 
triangular,  and 


=  0,  Ojc^J^k  =  0'  =  Ok'^y-  (2.12) 


Now,  to  delete  the  j.th  regressor,  where  1  <  i  <  k.  the  next 
steps  are  followed.  The  columns  of  X  are  permuted,  so  that  for 
some  permutation  matrix  P, 


1 

) 

1  1 

f 

] 

X  y 

II 

Ok  ^k  ^k 

) 

1  1 

[ 

I 

•^*,(1,1)  '^*.(1,2)  ^k,l  ^k,\ 


0 
0 

V  O*- 

O'-i) 


(2,2)  ^*,2  ^k.2 


0 

0*- 


0 

0 


I 

0 


0 

1 


(2.13) 


{k-j)  (1)  (p-k)  (1) 


For  k  =  6  and  1=3,  the  right-hand  factor  looks  like  the 
following: 
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^*.(1.2) 

°  ^Jt,(2.2)  ^j.2 


XXX 

XXX 

XXX 

XXX 


where  x's  denote  possible  nonzeros,  and  zeros  are  left  blank. 
After  that,  by  using  a  Givens  transformation  or  (rotation) , 
the  subdiagonal  of  this  matrix  is  annihilated.  A  Givens 
rotation  is  any  orthogonal  matrix  of  the  form 


where  Gj^  is  of  the  form 


1,0  0  ...  0  -yj 


The  j.  and  k  subscript  in  correspond  to  the  row  numbers 
associated  with  the  7*s:  The  first  y  is  in  row  k  and  the 
second  y  is  in  row  j.  Next  the  equation  (2.13)  is  partitioned 
and  after  using  the  orthogonality  of  Givens  rotations  and 
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performing  multiplications, 
obtained : 


the  following  equation  is 


( 

(1,1) 

^k.  (1.2) 

®:/.i 

^*.1  ' 

0 

0 

^^k.  (2,2) 
0 

0 

I 

0 

V 

1 

O’* 

0 

0^ 

1  , 

Furthermore  (2.14)  is  repartitioned  and  relabeled  to  get  its 
last  shape: 


/ 

\ 

[^*-1 

^k-i 

Qk-1 

^k  -1  ^J:-l 

0 

I  0 

\ 

/ 

o’- 

O’-  1 

(k-l) 

(p-k+l) (1) 

(k-l) 

(p-k+1) (1) 

Note  that  (2.14)  holds  with  k  replaced  by  k  -  1  and  that  the 
deleted  column  is  ready  to  reenter  the  regression  at  any  time. 
The  above  algorithm  uses  approximately  n ( p+2k-3i ) 
multiplications  and  additions. 

However,  the  chosen  column  to  drop  is  that  one 
which  yields  the  smallest  increase  in  the  residuals.  This 
increase  for  the  ith  independent  variable  is  |  ^ j  |  ^/  iUjP,  where 
is  the  jth  regression  coefficient,  and  l!Uj||^  is  the  jth 
diagonal  element  of  (Xj^'^X^)'^.  Thus  the  column  to  be  dropped,  if 
a  test  for  lack  of  significance  is  satisfied,  is  the  column  j. 
for  which  |l  u  Ji  is  smallest. 
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b.  Adding  and  Deleting  Observations 

Now  n  rows  of  X  have  entered  into  the  regression, 
and  an  additional  row  x*  and  observation  n  of  y  is  going  to  be 
inserted.  In  this  situation,  a  factorization  has  taken  place 
and  we  have: 


' 

(X„  y.) 

Qn  0 

'Rn 

y n*l 

= 

jr’’  n 

n, 

.0’’  1  0, 

.0*’  1, 

where  is  n  x  p  with  orthonormal  columns,  R,,  is  p  x  p  upper 
triangular,  also 

qJq„  =  I  and  P  X  p,  Qjr„  =  0,  and  =  Q„xy„.  (2.15) 
Furthermore  the  Givens  rotations  are  applied  and  after 
relabeling,  repartitioning  and  performing  some  elementary 
operations  we  get  the  following: 


\ 

H 

O 

Notice  that  (2.15)  holds  with  n  replaced  by  n  +  1  and  that  the 
above  algorithm  requires  n  +  1  additional  storage  locations 
for  g.  However,  we  can  easily  update  V  =  R"^,  since 

l=Rn^v^=iR,^  L  ' 

V  0  / 

and  we  apply  the  Givens  rotations  to  get: 
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X)  G 


iRn^x''  0> 


a+l 

I 


=  J?  *'v 

•^n+l  •'ll*!  ' 


Now  we  want  to  delete  the  ith  observation.  Then  if 
x'^'  is  the  jth  row  of  X,  n  is  the  j.th  entry  of  y,  q'*'  is  the  jth 
row  of  Q,  and  p  is  the  ith  entry  of  r,  a  permutation  matrix  P 
can  be  obtained  such  that: 


\  ] 

'  \ 

\ 

Ir  c\ 

p 

Xn  Vn 

= 

P 

Q  T 
“a  ^  n 

.  ) 

\ 

) 

,0^  1, 

'Rn 

O*"  0  . 

.O**  1. 

Next,  we  apply  the  Gram-Schmidt  process.  After  this  we  have: 
Q=o  =  <l-g*'g)  o^=-Qq, 

when  0  =  0,  (q^f  w  )  is  chosen  by  a  special  feature  of  the 
orthogonalization  code.  Now  if  we  use  this  orthogonalization, 
we  get: 


1  \ 

Xn-i  Ym 

- 

/  \ 

Q  ^  ^n-x 

'  I  q  o' 
0*'  o  Y 

(Rr, 

D  O 

0*"  0 

Q  5 

(Rr, 
o  a 

0'  Y 

,x^  n  , 

.g*"  W  p  ; 

.o**  0  1, 

.o'  1, 

.0'  1, 

0 


q^  pj\ 


ij 


Q  0  B 
Icr*"  1  o 


Next  we  choose  Givens  trainsf ormations  such  that,  (q^,  (i))G^  s 
(q^,  «)Gp.p+i.  .  .Gi  p+i  =  (0^,  T).  Since  I  (q^f  “)  II  =  Ir  it  follows 
that  r  =  ±1.  So  the  equation  (2.16)  can  be  written  as: 
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'  \  (  oVi? 

yn-1  =  Q  q  0^  Y 

n  j  l^g*"  ci>  p  jio*'  ijlo’'  lj(o^  1^ 

=  Qn-l  0  ^a-l  s’*  Y  • 

.  0^  T  p  Ji  0^  1  , 

because  the  matrix 


(2.17) 


(0  (On-:L  q*] 

Ig*"  <o;  0^  T  j 

has  orthonormal  columns  it  follows  q*  =  0  and  because  we  have: 


where  R„.i  is  an  upper  triangular  matrix.  However  from  the 
equation  (2.17)  above,  we  obtain  the  desired  factorization 
which  is: 


and  we  can  recover  the  dropped  row  by: 


JC*"  =  TS  ?!  =  TY  +  p  ■ 


Finally,  we  have  to  update  V  =  R Now, 
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I  0 
0*"  1 


0^ 

X*’  o' 

'  V. 

o' 

CO, 

1 

— 

G^G 

1 

V  <0 

CO; 

1  CO 

H 

1 

\ 

u 

/ 

.  0  X, 

V, 

TU**" 


•  r 


2e„.i^u+s 

XV 


A 


Since  T  •=  ±1,  it  follows  that  u*  =  0.  Hence  =  I,  and 

V„„i  is  the  desired  update  [Ref.  10]  . 


3.  Tony  F.  Chan  (1986) 

An  algorithm  is  presented  for  computing  a  column 
permutation  0  and  a  QR  factorization  AH  =  QR  of  an  n  by  m  (n 
>  m)  matrix  A  such  that  a  possible  rank  deficiency  of  A  will 
be  revealed  in  the  triangular  factor  R  having  a  small  lower 
right  block.  For  matrices  of  low  rank  deficiency,  the 
algorithm  reveals  the  rank  of  A,  and  the  cost  is  only  slightly 
more  than  the  cost  of  one  regular  QR  factorization.  An  upper 
and  lower  bound  on  the  singular  values  of  A  are  stated.  These 
can  be  used  to  infer  the  numerical  rank  of  A. 

A  very  useful  factorization  of  an  n  by  m  (n  >  m) 
matrix  is  the  QR  factorization,  given  by  All  =  QR,  where  IleR"™ 
is  a  permutation  matrix,  QeR"^^  has  orthogonal  columns  and 
satisfies  Q^Q  =  I,,,  and  ReR"”'  is  upper  triangular. 

If  A  has  full  rank,  then  R  is  nonsingular.  In  many 
applications  in  which  A  is  nearly  rank  deficient,  it  is 
desirable  to  select  the  permutation  so  that  the  rank 
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deficiency  is  exhibited  in  R  having  a  small  lower  right  block. 
For  if  R  is  partioned  as 

^  _  -^11  -^2 

~[o 

where  R22  is  d  by  d,  then  it  is  easy  to  show  that 
iRjjlj,  where  we  have  used  the  notation  to  denote 

the  ith  singular  value  of  A,  with  0i  >  O2  >  ...  >  o„. 

Therefore,  if  is  small,  then  A  has  at  least  d  small 

singular  values  and  thus  is  close  to  being  rank  d  deficient. 
The  converse,  unfortunately,  is  not  true.  In  other  words,  if 
A  has  d  small  singular  values,  then  it  is  not  guaranteed  that 
a  given  QR  factorization  of  A  has  a  small  iJZjjh  •  LetA^F  91“° 

be  the  matrix  of  order  n  illustrated  below: 


A„  =  dlasrd,  8,B^, 


where  s  and  c  satisfy  s^  +  c^  =  1.  For  n  =  50,  c  =  0.2,  we  have 
o,^(A„)  =  10’^.  On  the  other  hand,  A^  is  its  own  QR  factorization 

and  obviously  has  no  small  R22  block  for  any  value  of  d. 

Besides  being  able  to  reveal  rank  deficiency  of  A,  a 
QR  factorization  with  a  small  R22  block  is  very  useful  in  many 
applications,  such  as  in  the  rank  deficient  least  squares 
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problem  and  in  the  subset  selection  problem.  Therefore  a 
variety  of  techniques  have  been  proposed  to  compute  it.  Since 
QR  factorization  is  essentially  unique  once  the  ^  ’"mucation  11 
is  fixed,  these  techniques  all  amount  to  finding  an 
appropriate  column  permutation  of  A.  Perhaps  best  known  of 
these  is  the  column  pivoting  strategy.  Although  this  strategy 
is  usually  very  effective  in  producing  a  triangular  factor  R 
with  small  |'R22'!<  very  little  is  known  in  theory  about  its 
behavior,  and  it  can  fail  on  some  matrices.  Chan's  algorithm 
does  not  require  computing  the  SVD  of  A  and  is  most  closely 
related  to  one  recently  developed  by  Foster 
[Ref.  11]. 

a.  Revealing  Rank  One  Deficiency 

Assume  that  A  is  nearly  rank  one  deficient.  We 
would  like  to  find  a  column  permutation  of  A  such  that  the 
resulting  QR  factorization  has  a  small  pivotal  element  It 
turns  out  that  this  permutation  can  be  found  by  inspecting  the 
size  of  the  elements  of  the  singular  vector  of  A  corresponding 
to  the  smallest  singular  value  a,,.  This  procedure  was  first 
pointed  out  in  1976  [Ref.  12]. 

Assume  that  there  is  a  vector  x  e  R"  with  l!x||2  =  1 
such  that  Ax  ' 2  =  f,  and  let  11  be  a  permutation  such  that  if 
n’‘'x  =  y,  then  |  |  =  IlylL  and  iyllz  =  llxllz  =  !•  Such  a  x  can  be 
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obtained  by  the  LINPACK^°  condition  estimator.  Now,  because 
of  these  we  have  |  r?,, J  >  <Jl/n  and  furthermore 


Therefore , 


/  \ 


Q  ^Ax  =  U  n  n^x  =  Ry  = 


\Pnn^fl/ 


e  =  lAJflj  =  \Q^AX\^  -  \RyU  -  \Pnn^n\> 

from  which  we  have  the  result,  Ip^al  ^  e  .  Now  let  v  e  withfv^l  =  1 

be  the  right  singular  vector  of  corresponding  to  the 
smallest  singular  value  o^.  Then  we  have 

lAvIj  =  o^. 

Therefore,  by  the  above,  if  we  define  the  permutation  11  by 

I  (nV)  I,  =  IrL, 

then  All  has  a  QR  factorization  with  a  pivot  at  least  as 
small  as  \/na^  in  absolute  value. 

Since  only  the  permutation  11  is  needed,  it  is  not 
necessary  to  compute  the  SVL  of  A  in  order  to  find  v  exactly. 
In  practice,  one  can  use  a  few  steps  of  inverse  iteration  to 
compute  an  approximation  to  v  from  which  the  permutation  11  can 
be  determined.  In  the  more  interesting  case  where  the 


^“Together,  LINPACK  and  EISPACK  represent  the  state  of  the 
art  in  software  for  matrix  computation. 
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inverse  iteration  should  converge  rapidly.  This  suggests  a  two 
pass  algorithm  in  which  one  first  computes  any  QR 
factorization  of  A,  then  performs  inverse  iteration  with  R  to 
find  an  approximate  v,  then  determines  n,  and  then  computes 
the  QR  factorization  of  All. 


b.  Revealing  Higher  Dimensional  Rank  Deficiency. 

In  this  section,  we  consider  the  case  where  A  is 
nearly  d  deficient,  with  d  >  1.  Our  goal  is  to  find  a 
permutation  11  such  that  if 


is  the  QR  factorization  of  All,  with  R22  e  R‘*’“*,  then  liZjal  is 
small  in  some  norm. 

A  natural  way  to  extend  the  one  dimensional  result 
of  subsection  a  is  to  repeatedly  apply  the  one  dimensional 
algorithm,  for  d  =  l,2,...,Rii,  the  leading  principal 

triangular  part  of  R.  Suppose  that  we  have  already  isolated  a 
small  d  X  d  block  R22.  To  isolate  a  small  (d+1)  x  (d+i)  block, 
we  compute,  using  the  one  dimensional  algorithm  given  in 
subsection  a,  a  permutation  P  such  that  is  the  QR 

factorization  of  R^P  and  where  the  (n-d,  n-d)th  element 
is  small.  Then  with 
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fl  =  n 


(p  o' 

U  i) 


where,  P  is  a  permutation  matrix  such  that  P  e 
|(PMi|  =  |PVL  and  the  singular  vector  u  G  corresponding 

to  a^.,,,(Rii)  with  ||uii2  =  1,  and  =  a„in(Rii),  also 


0  =  0 


After  the  above,  it  can  be  easily  verified  that 


Alt  =  5 


0  ^2  , 


is  the  QR  factorization  of  aH. 

To  make  the  above  procedure  more  understandable, 
the  updating  process  is  illustrated  for  n  =  5  and  i  =  2.  The 
permutation  is  defined  by  moving  the  second  column  of  R  to  the 
last  column.  Thus 


R 


'x  X  X  X 

X  X  X  X 

(I\  X  JC  + 

□j  JC  + 

- 


where  again,  x  =  any  nonzero  element,  +  =  nonzero  element 
being  introduced  and  □  =  element  being  annihilated.  So  working 
from  the  left  in  planes  (i,i+l),  (i+l,i+2),  ...  ,  (n-l,n)  we 


* 
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annihilate  the  subdiagonal  of  Rn^  by  rotations.  Simultaneously 
we  compute  Q  =  QQ^.  The  computation  It  =11?^  is  trivial,  like 

that  of  Rll^.  If  crmj^n(Rii)  is  "tiny"  then  we  have  for  n  =  5, 


with  Q  having  orthonormal  columns.  Now  if  we  annihilate  the 
last  column  of  the  upper  trapezoidal  matrix,  as  before,  from 
the  bottom  to  top,  and  then  drop  the  last  columns  of  the  11  and 
R  matrices,  we  obtain  A  s  QR 

The  above  algorithm,  to  produce  the  desired  QR 
factorization,  is  based  on  the  following  two  assertions  for  i 
=  n,  n-1,  n-d+1: 

■  Rii  has  a  small  singular  value  so  that  the  (i,i)th 
element  of  is  guaranteed  to  be  small. 
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■  The  last  [  i .  e .  , 


the  (n-i)th]  row  of  is 


small . 

If  these  two  assertions  are  true,  then  the  lower 
(n-i+1)  X  (n-i+1)  block  of  R  is  small  and  we  have  the  desired 
QR  factorization.  But  these  two  assertions  are  true  because  of 
the  following  lemmas  proved  in  [Ref.  13]. 

Let  B  e  R"’^  be  a  matrix  containing  any  subset  of  k 
columns  of  A.  Then 

Also  if  the  matrix  W  =  .  .  .  ,Wn]  e  R"*’'  has  been 

computed  by  the  above  algorithm  then  it  should  satisfy  the 
following  properties: 

■  llwjl^  =  1, 

■  (wjj  =  0  for  j  >  i, 

■  i(wjj  =  II  Wiiu>=i/yi, 

■  llAnwJi^  =  <  aJA)  . 

Finally,  Chan's  algorithm  [Ref.  13] 
computes  a  permutation  11  and  a  QR  factorization  of  A  given  by 
All  =  QR  where  the  elements  of  the  lower  d  x  d  upper  triangular 
block  of  R  satisfies 

kiil  ^  ^  2^^-^^a^y/n 
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for  n-d  <  i  ^  j  ^  n. 

Very  often,  it  is  desirable  to  be  able  to  infer  the 
rank  of  A  from  its  QR  factorization  by  estimating  the  small 
singular  values  of  A  from  the  triangular  factor  R 
[Ref.  14].  For  this,  we  state  bounds  on  the  singular 
values  of  A  in  terms  of  the  matrices  R  and  VJ,  given  by  the 
following  inequality,  for  1  <  j  <  d, 

yJKW-a^  la 

where,  W  e  R"*",  R  e  R"”'  and  computed  by  the  rank  revealing 
algorithm,  and  W2^  denote  the  lower  right  i  by  j  upper 

triangular  blocks  of  R  and  W  respectively.  These  bounds  are 
proved  in  [Ref.  13]. 

Now  using  the  obtained  All  =  QR  factorization  we  can  solve 
the  least  square  problem  Ax  =  b  as  follows.  Let  r  be  the  rank 
of  the  n  X  m  matrix  A,  then  the  regression  coefficients  are 
given  by: 

X  ~  nni((Rii)  ^Qn.ib) 

where  i  =  1,  2,  3,  ...  ,  r.[Ref.  15] 

The  work  for  the  AD  =  QR  factorization  is  given  by: 

W(r)  =  in^(n  -  in/3)  +  Im^r  +  2in^r,  where  I  is  the  number  of 
inverse  iterations  used  at  each  step.  Usually  I  =  2  is 
sufficient  in  practice.  So 
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W(r)  =  m^(n  -  m/3)  +  4m^r. 


(2.18) 


C.  PROCESS 

In  the  problem  of  finding  a  vector  x  e  R"  such  that  Ax  = 
b  where  the  data  matrix  A  €  R"^  and  the  observation  vector 
b  €  R"'  are  given  and  m  >  n  when  there  are  more  equations  than 
unknowns,  we  say  that  the  system  Ax  =  b  is  overdetermined. 
Usually  an  overdetermined  system  has  no  exact  solution  since 
b  must  be  an  element  of  range  (A)  ,  a  proper  subspace  of  R"’. 

This  suggests  that  we  strive  to  minimize  ||  Ax  -  b||p  for 
suitable  choice  of  p.  Different  norms  render  different  optimum 
solutions.  However,  much  progress  has  been  made  in  this  area, 
and  there  are  several  good  techniques  available  for  1-norm  and 
®-norm  minimization.  The  oldest  method  for  solving  the  full 
rank  least  squares  problem  is  the  method  of  normal  equations. 
The  accuracy  of  the  computed  normal  equations  solution  depends 
on  the  square  of  the  condition  number,  and  the  algorithm  is 
not  always  accurate. 

For  the  above  reason  some  techniques  are  established 
based  on  the  QR  factorization  method.  The  Householder  and 
Gram-Schmidt  QR  approaches  to  the  least  squares  problem  are 
more  stable  than  the  normal  equation  method.  Furthermore 
techniques  for  rank  revealing  QR-factorization  (i.e.,  Chan's 
algorithm  discussed  in  subsection  3)  can  give  a  solution  to 
the  subset  selection  problem  even  if  A  is  nearly  rank 
deficient. 
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In  the  following  chapters  we  will  try  to  compare  the 
results  of  the  above  algorithms  computationally  to  verify  the 
effectiveness  of  each  on  the  least  squares  problem. 
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III. 


RESULTS  AND  COMPARISONS 


A.  PREVIOUS  COMPARISONS 

There  have  been  a  number  of  studies  comparing  Householder 
and  Modified  Gram-Schmidt  QR  techniques  to  form  an  orthonormal 
basis  of  11(A)  [Ref.  16].  Moreover,  in  Elden's  paper 
there  is  a  comparison  between  his  and  Efroymson's  algorithm 
for  updating  regressions.  There  are  comparisons  in  work  and 
accuracy  between  the  Classical  and  Modified  Gram-Schmidt 
methods  in  the  Gragg-Leveque-Trangenstein  paper.  Tony  Chan 
also  compares  his  column  permutation  QR  factorization 
algorithm  with  the  regular  QR  algorithm,  in  case  of  required 
work  [Ref.  17].  Chan's  comparison  is  applied  only  on 
the  QR  factorization  part  of  the  regression  updating 
procedure . 

Furthermore,  an  algorithm  for  solution  of  the  subset 
selection  problem  is  presented  in  a  technical  support  package 
of  NASA  Tech  Briefs  [Ref.  18]  that  can  be  mentioned 
as  a  modified  Chan's  algorithm.  So  a  comparison  can  be  done 
between  these  algorithms.  Finally  a  justification  of  the  use 
of  the  reorthogonalization  in  Gram-Schmidt  QR  factorization  is 
presented  in  the  Daniel-Gragg-Kaufman-Stewart  paper 
[Ref.  19]. 
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B.  MODEL  AND  DATA 


The  data  in  TABLE  II-l  will  be  used  to  illustrate  the 
model  selection  methods  in  the  full  rank  case.  In  the  table 
below  there  are  two  matrices,  the  13x5  matrix  X  consisting 
of  a  column  of  ones,  followed  by  the  4.  column  vectors  of  the 
observations  on  the  independent  variables,  and  the  13  x  1 
column  vector  Y  of  observations  of  the  dependent  variable.  In 
order  to  conform  with  previous  results,  the  data  utilized  in 
this  experiment  is  identical  to  that  used  in  Elden's  algorithm 
[Ref.  8].  We  began  the  stepwise  regression  analysis  by 
inserting  and  deleting  columns  of  the  observation  matrix  X  and 
we  continued  by  inserting  and  deleting  rows  of  the  model. 

The  data  in  TABLE  II-2  and  TABLE  II-3,  have  one  and  two 
dependent  columns  respectively.  We  obtained  the  first  of  them 
by  subtracting  columns  2  and  3  of  the  data  in  TABLE  II-l  and 
the  second  by  repeating  the  first  column  of  the  same  data. 
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TABLE  II-l 


CASE  STUDY  1  ^ INPUT  DATA^ 
MATRIX  OF  OBSERVATIONS 


MATRIX  OF  OBSERV.  ON  THE  INDEPENDENT  VARIABLES 

COLUMN 
VECTOR  OF 
OBSER.  ON 
DEPENDENT 
VARIABLES 

1 . 00 

7.00 

26.00 

6.00 

60.00 

78.50 

1.00 

1.00 

29.00 

15.00 

52.00 

74.30 

1 . 00 

11.00 

56.00 

8.00 

20.00 

104 .30 

1.00 

11.00 

31.00 

8.00 

47 . 00 

87 . 60 

1.00 

7 . 00 

52 . 00 

6.00 

33 . 00 

95.90 

1 . 00 

11.00 

55.00 

9.00 

22 . 00 

109.20 

1.00 

3 . 00 

71.00 

17 . 00 

6.00 

102.70 

1.00 

1.00 

31.00 

22.00 

44 . 00 

72 . 50 

1.00 

2 . 00 

54.00 

18.00 

22.00 

93.10 

1.00 

21.00 

47.00 

4.00 

26.00 

115.90 

1.00 

1.00 

40.00 

23.00 

34 . 00 

83.80 

1.00 

11.00 

66.00 

9.00 

12.00 

113 .30 

1 . 00 

10.00 

68 . 00 

8.00 

12.00 

109.40 

To  obtain  the  data  in  the  Table  II-2,  (a  non-full  rank 
matrix)  ,  we  replaced  the  column  4  of  the  Table  II-l  with  a  new 
column,  column  6.  This  new  column  was  obtained  by  subtracting 
columns  2  and  3  of  the  matrix  of  observations  in  Table  II-l. 
Now  the  new  matrix  of  observations  has  a  linearly  dependent 
column  and  so  it  is  rank  one  deficient. 
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TABLE  II-2 


CASE  STUDY 

2a  riNPUT  DATA) 

MATRIX  OF 

OBSERVATIONS 

COLUMN 

VECTOR  OF 

flATRIX  OF  OBSERV. 

ON  THE  INDEPENDENT 

VARIABLES 

OBSER.  ON 

DEPENDENT 

VARIABLES 

1 . 00 

60. 00 

7 . 00 

26.00 

-19 . 00 

78 . 50 

1 . 00 

52 . 00 

1.00 

29.00 

-28.00 

74.30 

1 . 00 

20.00 

11.00 

56.00 

-45.00 

104 . 30 

1 , 00 

47 . 00 

11.00 

31.00 

-20.00 

87 . 60 

1 . 00 

33 . 00 

7.00 

52 . 00 

-45.00 

95.90 

1.00 

22 . 00 

11.00 

55.00 

-44 . 00 

109.20 

1.00 

6.00 

3.00 

71.00 

-68 . 00 

102.70 

1.00 

44 . 00 

1.00 

31.00 

-30.00 

72.50 

1.00 

22 . 00 

2.00 

54.00 

-52.00 

93.10 

1.00 

26.00 

21.00 

47.00 

-26.00 

115.90 

1.00 

34 . 00 

1.00 

40.00 

-39 . 00 

83 . 8n 

1 . 00 

12 . 00 

11.00 

66.00 

-55.00 

113 .30 

1 . 00 

12 . 00 

10.00 

68.00 

-58 . 00 

109 .40 

Furthermore  in  Table  II-3  we  inserted  once  more  the  column 
1  of  the  matrix  in  Table  II-2  as  a  new  column  7.  This  column 
took  the  third  place  in  the  matrix  and  reduced  its  rank  to 
rank  two  deficient. 
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TABLE  II-3 


CASE  STUDY  2b  (INPUT  DATA^ 
MATRIX  OF  OBSERVATIONS 


MATRIX  OF  OBSERV.  ON  THE  INDEPENDENT  VARIABLES 

COLUMN 
VECTOR  OF 
OBSER.  ON 
DEPENDENT 
VARIABLES 

1.00 

60.00 

1.00 

7.00 

26.00 

-19.00 

78.50 

1.00 

52 . 00 

1.00 

1.00 

29.00 

-28.00 

74 .30 

1.00 

20.00 

1.00 

11.00 

56.00 

-45.00 

104 . 30 

1 . 00 

47 . 00 

1.00 

11.00 

31.00 

-20.00 

87 . 60 

1 . 00 

33 . 00 

1.00 

7.00 

52.00 

-45.00 

95.90 

1.00 

22 . 00 

1.00 

11.00 

55.00 

-44.00 

109.20 

1.00 

6.00 

1.00 

3.00 

71.00 

-68.00 

102.70 

1.00 

44.00 

1.00 

1.00 

31.00 

-30.00 

72.50 

1.00 

22.00 

1.00 

2.00 

54.00 

-52.00 

93.10 

1.00 

26.00 

1.00 

21.00 

47.00 

-26.00 

115.90 

1.00 

34 . 00 

1.00 

1.00 

40.00 

-39 . 00 

83.80 

1.00 

12 . 00 

1 . 00 

11 . 00 

66.00 

-55.00 

113 . 30 

1.00 

12 . 00 

1 . 00 

10.00 

68.00 

-58 . 00 

109.40 

C.  SIMULATION 

As  mentioned,  the  three  compared  algorithms  use  a  QR  < 

factorization  technique  to  solve  the  regression  problem.  But 
the  three  techniques  used  have  different  philosophies  and 
different  advantages  and  disadvantages.  However,  to  obtain  the 
numerical  results  needed  for  the  comparison  we  used  three 
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analogous  sets  of  MATLAB  codes.  It  is  important  that  Elden's 
algorithm  is  not  a  "full"  updating  regression  algorithm 
because  there  is  no  way  to  update  regressions  using 
Householder  QR  factorization.  Elden  adds  or  deletes  columns  or 
rows  at  the  beginning  or  the  end  of  the  observations  matrix 
and  does  not  explicitly  update  the  Householder  QR 
factorization.  Because  of  this  we  did  not  implement  Elden's 
algorithm.  Instead,  we  used  the  basic  QR  factorization 
algorithm  to  solve  each  individual  problem.  For  column 
insertions  and  deletions  on  the  right  of  the  matrix  our 
algorithms  are  computationally  equivalent  with  Elden's 
algorithms . 


D.  VALIDATION 

A  numerical  example  was  needed  to  illustrate  the  theory 
developed  in  Chapter  II,  so  for  each  of  the  three  algorithms 
we  ran  the  corresponding  MATLAB  codes  on  the  data  of  the  case 
study.  All  computations  are  done  on  a  286  PC  in  double 
precision,  with  a  relative  machine  precision  of  about  10'^®. 
Notice  that  during  the  stepwise  procedure  on  case  study  1, 
there  was  no  rank  deficient  case,  so  there  was  no  illustration 
of  the  effectiveness  of  Chan's  algorithm  to  reveal  the 
numerical  rank  of  a  given  matrix. 
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E 


MEASURES  OF  EFFECTIVENESS 


1.  Case  Study  1  (Full  Rank  Case) 

In  order  to  compare  the  performances  of  the  algorithms, 
the  STSC'S  Statistics/Graphics  package  STATGRAPHICS  4.0  was 
used  on  the  same  data.  The  results  of  the  algorithms  and 
STATGRAPHICS  stepwise  variable  selection  procedure  are 
presented  in  the  following  tables. 

STEPWISE  REGRESSION  ANALYSIS  WITH  ORTHOGONAL 
TRANSFORMATIONS 
NUMBER  OF  OBSERVATIONS  13 

STEP  1 


TABLE  III-l.  Summary  Statistics  for  Stepwise  Selection  of 
Variables  for  the  Data  in  TABLE  II  after  step  1. _ 


VARIABLE  ENTERED  AFTER  THE  COLUMN  OF  ONES:  Column  5 

(1) 

ELDEN 

(2) 

GRAGG - 

LEVEQUE- 

TRANGENST. 

. 

(3) 

CHAN 

(4) 

S  •  G .  S  . 

STATGRARH . 
STSC 

CONST. 

117 . 5679312 

117.5679312 

117 . 5679312 

117.567931 

VARIABLE 

COEFFICIENTS 

ALPHAS 

-0.7381618 

-0.7381618 

-0.7381618 

-0.738162 

RES. 

SUM 
SQUAR . 

883 . 8669169 

883.8669169 

883.8669169 

881.89616 

Fe 

LEVEL 

22.7985202 

22 . 7985202 

22.7985 

F  OF 

GENER. 

MODEL 

22. 7995203 

22 . 7995203 

22 .7995203 

22 . 80 

STEP  2 

TABLE  III-2.  Summary  Statistics  for  Stepwise  Selection  of 
Variables  for  the  Data  in  TABLE  II  after  step  2 . 


VARIABLE  ENTERED;  Column  2 

(1) 

(2) 

(3) 

(4) 

CONST. 

103 . 0973816 

103.0973816 

103 . 0973816 

103 . 097382 

VARIABLE 

COEFFICIENTS 

ALPHAS 

-0.6139536 

-0.6139536 

-0.6139536 

-0.613954 

ALPHA2 

1.4399583 

1.4399583 

1.4399583 

1.439958 

RSS 

74.7621122 

74 .7621122 

74.7621122 

74 .7621 

Fe 

108 .2239093 

108 .2239093 

108.2239093 

108.2239 

F  GEN. 
MODEL 

176.6269531 

176.6269531 

176.6269531 

176.6270 

STEP  3 

TABLE  III-3.  Summary  Statistics  for  Stepwise  Selection  of 
Variables  for  the  Data  in  TABLE  II  after  step  3 . 


VARIABLE  ENTERED  :  Column  3 

(1) 

(2) 

(3) 

(4) 

CONST. 

71 . 6483069 

71 . 6483069 

71.6483069 

71.648307 

VARIABLE 

COEFFICIENTS 

ALPHAS 

-0.2365402 

-0.2365402 

-0.2365402 

-0.236540 

ALPHA 2 

1.4519379 

1.4519379 

1.4519379 

1.451938 

ALPHA 3 

0.4161098 

0.4161098 

0.4161098 

0.41611 

RSS 

47.9727294 

47.9727294 

47.9727294 

47 . 9727 

Fe 

5.0258647 

5.0258647 

5.028647 

5.0259 

F  GEN. 
MODEL 

166.8316801 

166.8316801 

166.831o801 

16C.8317 
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STEP  N»  4 

TAB:^E  III-4.  Summary  Statistics  for  Stepwise  Selection  of 
Variables  for  the  Data  in  TABLE  II  after  step  4. 


VARIABLE  REMOVED:  Column  5 

(1) 

(2) 

(3) 

(4) 

CONST. 

52 . 5773489 

52.5773489 

52 . 5773489 

52.577349 

VARIABLE 

COEFFICIENTS 

ALPHA 2 

1.4683057 

1.4683057 

1.4683057 

1.468306 

ALPHA3 

0.6622505 

0.6622505 

0.6622505 

0.662250 

RSS 

57 .9044832 

57.9044832 

57.9044832 

57 . 9045 

Fe 

1.8632873 

1.8632873 

1.8632873 

1.8633 

F  GEN. 
MODEL 

229.5036971 

229.5036971 

229.5036971 

229 . 5040 

STEP  N»  5 

TABLE  III-5.  Summary  Statistics  for  Stepwise  Selection  of 
Variables  for  the  Data  in  TABLE  II  after  step  5. 


OBSERVATION  ENTERED:  ROW  3.  (a  second  time) 

(1) 

(2) 

(3) 

(4) 

CONST. 

52 . 6817201 

52 . 6817201 

52 . 681720 

VARIABLE 

COEFFICIENTS 

ALPHA2 

1.4584656 

1.4584656 

1.4584656 

1.458466 

ALPHA3 

0.6594452 

0.6594452 

0.6594452 

0.659445 

RSS 

59.9550974 

59.9550974 

59.9550974 

59.9551 

Fe 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

F  GEN. 
MODEL 

250 . 3437770 

250.3437770 

250.3437770 

250.3438 

STEP  6 

TABLE  III-6.  Summary  Statistics  for  Stepwise  Selection  of 
Variables  for  the  Data  in  TABLE  II  after  step  6. 


OBSERVATION  ENTERED;  Row  2.  (a  second  time) 

(1) 

(2) 

(3) 

(4) 

CONST. 

53 . 0380112 

53.0380112 

53.0380112 

53.038011 

VARIABLE 

COEFFICIENTS 

ALPHA2 

1.4484905 

1.4484905 

1.4484905 

1.448490 

ALPHA 3 

0.6549147 

0.6549147 

0.6549147 

0.654915 

RSS 

60.8055442 

60.8055442 

60.8055442 

60.8055 

Fe 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

F  GEN. 
MODEL 

312 . 7948771 

312.7948771 

312.7948771 

312.7950 

STEP  7 

TABLE  III-7.  Summary  Statistics  for  Stepwise  Selection  of 
Variables  for  the  Data  in  TABLE  II  after  step  7. 


OBSERVATION  REMOVED;  Row  1 

(1) 

(2) 

(3) 

(4) 

CONST. 

53 . 0380116 

53.0380116 

53.0380116 

53 . 038012 

VARIABLE 

COEFFICIENTS 

ALPHA2 

1.4484905 

1.4484905 

1.4484905 

1.448491 

ALPHA 3 

0.6549147 

0.6549147 

0.6549147 

0.6L4915 

RSS 

60.8055442 

60.8055442 

60.8055442 

60.8055 

Fe 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

F  GEN. 
MODEL 

312.7948771 

312.7948771 

312.7948771 

312.7949 

2.  Case  Study  2  (Rank  Deficient  Case) 

Here  we  want  to  test  the  thesis  algorithms  on  a  rank 
deficient  case.  So  we  used  the  singular  matrices  in  Tables  II- 
2  and  II-3.  The  first  matrix  has  one  linearly  dependent  column 
(column  5)  and  the  second  two  linearly  dependent  columns 
(column  3  and  6).  We  applied  the  corresponding  algorithms' 
MATIiAB  codes  to  those  matrices.  The  results  of  the  algorithms' 
effectiveness  are  shown  in  the  following  tables. 


TABLE  IV-1.  Summary  statistics  for  the  rank  deficient  case, 
(case  study  2a) . 


RESULTS  OF  CASE  STUDY  2a  REGRESSION 

(1) 

ELDEN 

(2) 

GRAGG- 

LEVEQUE- 

TRANGENST. 

(3) 

CHAN 

(4) 

S  .  G .  S  . 

STATGRARH 

STSC 

CONST . 

3 .2153*10^® 

-146.000000 

71.6483069 

VARIABLE 

COEFFICIENTS 

ALPHAS 

-0.0353*10^® 

2.000000 

-0.2365402 

ALPHA2 

-0.0194*10^® 

2 . 5676*10®'* 

1.4519379 

ALPHA 3 

-0.0173*10^® 

-2 . 5676*10®'* 

0.4161098 

ALPHA6 

-0.0132*10^® 

-2 . 5676*10®'* 

0.0000000 

RES. 

SUM 

SQUAR. 

5.7494*10®® 

45.8653934 

47.9727294 

F  OF 

GENER. 

MODEL 

9 .2803*10  ®'* 

116.4231889 

166.8316801 
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TABLE  IV-2.  Summary  statistics  for  the  rank  deficient  case, 


(case  study  2b) . 


RESULTS  OF  CASE  STUDY  2b  REGRESSION 

(1) 

(2) 

(3) 

(4) 

CONST. 

5.786*10^'' 

-7.6870*10^^ 

71.6483097 

VARIABLE 

COEFFICIENTS 

ALPHA5 

-0.007242 

3 .8663*10^2 

-0.2365402 

ALPHA7 

-5. 786*10^'’ 

7.6870*10'*^ 

0.0000000 

ALPHA 2 

0.708*10^'’ 

-3.8663*10^^ 

1.4519379 

ALPHA3 

-0.708*10^“ 

3.8663*10^^ 

0.4161098 

ALPHA6 

-0.708*10^'' 

3 ,8663*10^2 

0.0000000 

RSS 

59.17750 

1. 4738*10^® 

47.9727294 

F  GEN. 
MODEL 

212 . 9952033 

3 .953*10-31 

166.8316801 

A  discussion  of  the  above  results  is  given  in  the 
following  chapter. 
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IV.  CONCLUSIONS  AND  RECOMMENDATIONS 


A.  GENERAL 

The  major  emphasis  of  this  thesis  was  to  examine  the 
performance  of  three  algorithms  for  updating  regressions.  As 
mentioned,  the  algorithms  examined  were  Elden  (Householder), 
Gragg-Leveque-Trangenstein  and  Chan.  After  running  the 
corresponding  MATLAB  codes  we  obtain  the  results  shown  in 
Tak)les  III  and  IV.  The  discussion  of  these  results  will  be 
divided  into  two  categories:  the  accuracy  and  stability,  and 
thi  number  of  computations. 

1.  Accuracy  and  Stability 
a.  Full  Rank  Case 

The  results  of  the  case  study  1,  in  Table  III,  show 
ue  that  no  one  algorithm  uniformly  outperforms  any  other  with 
regard  to  accuracy  in  the  full  rank  case.  All  algorithms  give 
exactly  the  same  results  in  all  steps. 

b DcX i cient  Rank  Case 

Comparing  the  results  of  the  case  study  2  (TABLES 
IV-1  and  IV-2)  with  the  results  of  the  case  study  1  step  3 
(TABLE  I I 1-3)  we  can  see  that  they  are  identical  except  for 
the  coefficients  of  variables  ALPHA6  and  ALPHA? .  These 
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variables  are  used  only  in  the  case  study  2  and  they 
correspond  to  the  linearly  dependent  columns  of  the  matrix  of 
observations  (see  TABLES  II-l  and  II-2).  The  values  of  these 
variables'  coefficients  in  both  rank  deficient  cases  are  zero. 
Because  of  the  above  we  can  say  that  the  results  of  Chan's 
algorithm  in  the  rank  deficient  case  without  the  linearly 
dependent  columns  are  the  same  as  the  other  algorithms' 
results  in  the  full  rank  case.  However  in  case  study  2  (rank 
one  and  rank  two  deficient  cases)  only  Chan's  algorithm  gives 
reasonable  results.  The  results  of  the  other  two  algorithms 
are  totally  wrong.  Now  it  is  easy  to  understand  that  the 
philosophy  of  Chan's  algorithm  is  to  drop  the  linearly 
dependent  columns  of  the  matrix  of  observations  and  work  with 
the  rest  of  them  to  obtain  the  regression  coefficients. 

2 .  The  Number  of  Computations 

To  compare  the  volume  of  computation  of  the  present 
algorithms  we  consider  that  all  m  column  vectors  of  an  n  x  m 
matrix  A  are  added  to  the  subspace. 

a.  Computing  Regression  Coefficients  at  Each  Step 
In  the  case  where  the  regression  coefficients  are 
computed  at  each  step  the  work  of  algorithms  is  as  follows. 


The  number  of  flops“  of  Elden's  method  is  about 
2nmf,  (2(n  -  m)m^  +  4mV3  for  producing  the  QR  factorization, 
mV 3  for  updating  the  inverse  of  the  triangular  matrix  and  mV3 
for  computing  the  regression  coefficients) .  Elden's  algorithm, 
as  mentioned  in  Chapter  II  subsection  Bl,  is  updating 
regression  without  using  at  each  step  the  matrix  Q  but  working 
only  on  the  triangular  matrix  R.  This  procedure  avoids 
computations  on  matrix  Q  and  so  is  cheap. 

If  the  data  matrix  is  not  very  rank  degenerate  Chan 
uses  same  work  as  Elden  on  the  updating  regression  procedure, 
(Householder  without  updatina  Q)  .  Using  the  equation  (2.18) 
for  this  case  with  r  =  m  we  have  a  total  work  for  the  rank 
revealing  QR  factorization  of  W(m)  =  nm^  +  llm^/3.  Also  the 
amount  of  flops  for  computing  the  regression  coefficients  is 
mV3.  In  other  words  the  work  for  Chan's  QR  factorization  of 
RH  is  about  nm^  4m^  flops. 

Gragg-Levequc-Trangenstein ' s  algorithm  requires  8nm 
flops  for  column  insertion,  0  flops  for  column  deletion,  3m(m 
+  2n)  for  row  insertion,  m(3m  +  14n',  flops  for  row  deletion. 
In  this  case  the  regress!  n  coefficients  are  computed  at  each 
step  we  have  to  add  the  work  of  solving  the  triangular  system 
R^b,  =  c^.  That  requires  a  total  of  m^/2  flops.  This  amount  of 
flops  is  because,  as  mentioned  in  Chapter  II  subsection  B2 , 


‘^Flops  number  means  the  whole  number  of  multiplications 
and  additions. 
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this  algorithm  updates  the  regression  using  the  matrix  Q  for 
keeping  the  numerical  stability.  That  means  that  in  the  worst 
case  the  algorithm  requires  a  total  work  of  about  6m^  +  28nm 
+m^/2  flops. 

After  the  above  we  can  see  that  Elden's  and  Gragg- 
Leveque-Trangenstein ' s  algorithms  outperforms  Chan's 
algorithm.  As  mentioned,  in  this  thesis  we  always  examine  the 
case  in  which  the  observation  matrix  has  more  rows  than 
columns  (i.e.,  n  >  m) .  Keeping  a  ratio  n/m  =  2  Gragg-Leveque- 
Trangenstein ' s  algorithm  is  the  cheapest  for  n  bigger  than  18. 
This  means  that  for  big  applications  this  algorithm  is  the 
cheapest . 


b.  Computing  Regression  Coefficients  Once 

In  the  case  where  the  regression  coefficients  are 
computed  once  we  have  a  lower  order  work  for  this  computation 
which  can  be  ignored.  So  the  total  work  for  the  algorithms  is: 
Elden's  2nm^  -  m^/3.  Gragg-Leveque-Trangenstein ' s  6m^  +  24nm 
and  Chan's  nm^  +  llmV3 .  Keeping  the  same  ratio  as  above  Gragg- 
Leveque-Trangenstein '  s  algorithm  is  still  the  cheapest  for  n 
bigger  than  15. 

B.  RECOMMENDATIONS 

In  this  thesis  we  examined  three  procedures  of  QR 
factorization  to  update  the  variable  selection  problem.  The 
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above  factorization  can  be  also  used  on  several  other 
mathematical  and  statistical  procedures.  Important  research 
can  be  done  on  using  the  QR  updating  algorithms  to  solve  the 
linear  programming  problem  using  the  simplex  method  and 
Karmarkar's  algorithm. 

As  mentioned  in  Chapter  III  section  D,  we  used  Householder 
codes  to  simulate  the  procedure  of  Elden's  algorithm.  A  closer 
comparison  can  be  done  by  a  full  simulation  of  this  algorithm. 
Finally,  it  should  be  interesting  to  extend  this  thesis  in 
case  where  we  have  an  observation  matrix  with  fewer  rows  than 
columns  (i.e.,  n  <  m) . 
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APPENDIX  A.  MATLAB  CODES  FOR  HOUSEHOLDER'S  (ELDEN)  ALGORITHM 
The  codes  qrhf.m,  qrhup.m,  qrucd.iti,  qruri.m,  qrurd.m 
qrorth.m  and  rot.m  were  reproduced  here  with  Professor's  Gragg 
permission . 


function  W  =  qrhf (A) 

%  W  =  qrhf (A) : 

% 

%  W  is  a  pack  formed  array  which  contains  the  HOUSEHOLDER  QR 
%  FACTORIZATION  of  A.  Wfi  have  A  =  QR  with  Q  unitary  and  R 
%  upper  trapezoidal  with  nonnegative  diagonal  elements.  The 
%  commands  R  =  W,  [Q  R]  =  qrhup(R) ,  executed  by  qrh,  unpack. 
%  However  this  is  not  necessary,  and  in  fact  it's  inefficient, 
%  for  most  applications. 

%  Copyright  (c)  16  February  1991  by  Bill  Gragg.  All  rights 

%  reserved. 

%  qrhf  calls  sgn. 

%  begin  qrhf 

[n  m]  =  size (A) ; 
for  k  =  l:min(m,n) 

q  =  k:n;  u  =  A(q,k);  r  =  norm(u);  t  =  u(l); 
if  r  >  0 

t  =  u(l);  s  =  -  sgn(t);  u(l)  =  t  -  r*s; 

t  =  abs(t) ;  t  =  1  +  t/r;  d  =  sqrt(2*t) ; 

A(q,k)  =  u/d;  d  =  r*sqrt(t);  u  =  u/d; 

if  k  <  m 

p  =  k+l;m;  W  =  A(q,p)  ;  W  =  W  -  u*  (u '  *W)  ; 

t  =  W(l,:);  W(l, : )  =  s' *t;  A(q,p)  =  W; 

end 

end 

end 
W  =  A; 

%  end  qrhf 


function  [Q,R]  =  qrhup(R) 
%  [Q  R]  =  qrhup(R) ; 


73 


%  Computes  the  ELEMENTWISE  QR  factorization  of  A  given  the 
%  output  R.  Thus  [Q  R]=qrh(A) :=qrhup(qrhf (A) )  is  essentially 
%  equivalent  with  matlab's  function  qr,  except  that  we  execute 
%  an  inexpensive  unitary  diagonal  scaling  to  make  the  diagonal 
%  elements  of  R  nonnegative.  It  is  NOT  USEFUL  to  have  Q  in 
%  elementwise  form  to  solve  the  LS  problem,  norm(b  -  Ax)  = 
%  minimum.  The  purpose  of  matlab  having  Q  in  such  form  may  be 
%  to  avoid  having  to  explain  how  the  Householder  least  squares 
%  algorithm  really  works. 

% 

%  Copyright  16  February  1991  by  Bill  Gragg.  All  rights 
%  reserved. 

%  qrhup  calls  sgn. 

%  begin  qrhup 

[n  m]  =  size(R) ;  s  =  -  sgn (diag (R) ) ;  e  =  ones(n-m,l); 
Q  =  diag([s;e]);  q  =  m+l:n;  z  =  zeros (n-m, 1) ; 

root2  =  sqrt(2); 
for  k  =  min (m, n) : -1 : 1 

q  =  [k  q];  u  =  R(q,k);  r  =  norm(u) ; 
if  r  >  0 

u  =  u/(r/root2)  ;  T  =  Q(q,q)  ;  Q(q,q)  =  T  -  u*(u'*T)  ; 
end 

R(k,k)  =  r; 
if  k  <  n 

R(k+l:n,k)  =  z; 

end 

z  =  [  z  ;  0  ]  ; 

end 

%  end  qrhup 


function  W  =  sgn(Z) 

%  w  =  sgn(z)  or  W  =  sgn(Z); 

% 

%  For  z  a  complex  number  w  is  z/abs(z)  if  z  =  0  and  +  1  if 
%  z  =  0.  Thus  sgn  is  the  same  as  matlab's  sign  function  EXCEPT 
%  when  z  =  0.  We  always  have  abs(w)  =1.  W  is  the  elementwise 
%  (Schur)  sgn  function  of  the  complex  matrix  Z. 

%  Copyright  (c)  19  January  1991  by  Bill  Gragg.  All  rights 
%  reserved. 

%  sgn  calls  no  extrinsic  functions. 

%  begin  sgn 
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W  =  sign(Z);  p  =  find(Z  ==  0);  n  =  iength(p); 
W(p)  =  ones (n, 1) ; 

%  end  sgn 


Total  f]ops  (scalar  case,  see  csgn) : 

Real  ca.  0  flops.  Complex  case:  1  sqrt  +  4  mults  +  i 

add . 


function  [x,r]  =  qrhlsl(W,b) 

[X  r]  =  qrhlsl(W,b): 

Given  that  the  HOUSEHOLDER  QR  FACTORIZATION  of  A,  is  stored 
in  packed  form  in  the  array  W,  qhrs  SOLVES  the  least  squares 
problem  norm(b  -  Ax)  =  minimum  for  x.  It  also  efficiently 
computes  the  residual  r  :=  (b  -  Ax).  We  assume  that  rank (A) 
=  m  <=  n,  where  A  ^as  m  columns  and  n  rows. 

qhrs  calls  no  extrinsic  functions. 

begin  qrhs 

[n  m]  =  size(W);  root2  =  sqrt(2); 

%  Forward  solving:  computing  Q'b  =  H  (ni)  H  (m-1 )  .  .  . H  ( 1)  b . 
for  k  =  l:m 

q  =  k:n;  u  =  W{q,k);  rl  =  norm(u);  d(k)  =  rl; 
if  rl  >  0 

u  =  u/ (rl/root2 ) ;  v  =  b(q) ;  b(q)  =  v  -  u*(u'*v); 

end 

end 

%  Backsolving:  solving  Rx  =  bl  :=  b(l:m)  for  x. 
s  =  -  sgn (diag (W) ) ;  x  =  zeros(m,l); 
x(m)  =  s (m) ' *b (m) /d (m) ; 
for  k  =  m-1 : -1 : 1 

p  =  [k+1  p]  ;  x(k)  =  (s(k)  •  *b(k)  -  W  (k,  p)  *x  (p)  ) /d  (k)  ; 

end 

%  Computing  the  residual  r. 

ifm<n  r=b-  W*x;  else  r  =  0; 
end 

%  end  qrhlsl 


function  [ Eb , RSS , F, Fe]  =  regres2 (Q,R, P,RSSp,b, r) 
%  [Eb  RSS  F  Fe]=  regres2 (Q, R, P, RSSp, b, r) : 
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Eb  =  E(b),  is  the  expectation  of  beta  of  the  linear  least 
squares  problem,  no'-m(b  -  Ax)  =  minimum,  RSS  is  the  residual 
sum  of  squares  and  F  the  statistic  for  a  general  linear 
hypothesis.  Also  Fe  is  the  significance  level  to  enter,  SLE, 
"F-to-enter" ,  for  the  new  variable,  on  where  A  =  QRP '  is  the 
QR  general  FACTORIZATION  of  A.  p  =  A*x  is  the  orthogonal 
projection  of  b  onto  R(A)  and  the  residual  vector  r  =  b  -  pr 
=  b  -  Ax,  is  the  orthogonal  projection  of  b  onto  N(A'),  the 
orthogonal  complement  of  R(A) .  It  is  assumed  that  A  is  not 
a  zero  matrix. 

regresZ  calls  rows,  cols,  ones  and  norm. 

begin  regres2 

Eb  =  r  +  Q*R*P'*x; 

A  =  Q*R*P’;  n  =  rows (A) ; 

e  -  ones(n,l);  RSS  =  r'*r; 

f  =  (norm (v) -abs (e ' *b) /sqrt (n) ) * 

( norm ( V) +abs (e ' *b) /sqrt (n) ) ; 
df  =  (n-m)/ (m-1) ;  F  =  (df*f)/RSS; 

Ft  =  (abs(RSSp-RSS) * (n-m) )/RSS; 

%  end  regres2 


m  =  cols ( A) ; 
V  =  Q'*b; 
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APPENDIX  B.  MATLAB  CODES  FOR  GRAGG-LEVEQUE-TRANGENSTEIN ' S 
ALGORITHM 


function  [Q,R]  =  qruci (Q , R, a , i ) 

[Q  Rj  =  qruci (Q, R, a , i) : 

UPDATES  the  QR  f actori2at ion  A  =  QR  when  a  is  INSERTED  as 
COLUMN  i  of  A.  Q  has  orthonorma?,.  columns  and  R  is  upper 
trapezoidal  with  at  least  as  many  columns  as  rows. 

Copyright  (c)  20  July  1991  by  Pi 11  Gragg.  All  rights 

reserved , 

%  qruci  calls  qrorth  and  rot. 

%  begin  qruci 

[r  m]  size(R);  n  =  length(a);  [q  s  t]  = 

qrorth (Q, a)  ; 

R(:,i+l:m+l)=R(:,i:m);  R(:,i)=s;  m=m+l; 

if  r  <  n 

Q  =  [Q  q] ;  R  =  [R; zeros (l,m) ] ; 
r=r+l;  R(r,i)=t; 

end 

for  k  =  r-l : -1 : i 

p  =  k+ 1 :  m ;  q  =  k :  k+ 1 ;  [  c  s  t  ]  = 

rot ( R ( q , i ) )  ; 

R(q,i)  =  [t;0];  G  =  [c  -  s';s  c'];  QC^q)  = 

Q(: ,q)*G; 

R(q,p)  =  G' *R(q,p) ; 

end 

if  i  <  r 

q  =  i+l:r;  d  =  diag(R);  d  = 

sgn (d (q) ) ; 

D  =  diag(d) ;  Q(:,q)  =  Q(:,q)*D;  R(q,:)  = 

D' *R(q, : ) ; 

end 

%  end  qruci 


function  [Q,R,a]  =  qrucd(Q,R,i) 


%  [Q  R  a]  =  qrucd (Q,R, i) : 


c^o  c\o  <^o  <AP  o\P 


UPDATES  the  QR  factorization  A  =  QR  when  COLUMN  i  is  DELETED 
from  A.  Q  has  orthonormal  columns  and  R  is  upper  trapezoidal 
with  at  least  as  many  columns  as  rows.  The  deleted  column  is 
called  a  [Ref  19 ]  . 

Copyright  (c)  20  July  1991  by  Bill  Gragg.  All  rights 

reserved, 

%  qrucd  calls  rot. 

%  begin  qrucd 

if  nargout  >2  a  =  Q*R(:,i);  end, 

[r  m]  =  size(R)  ;  R(:,i)  =  [  ]  ,*  m  =  m  -  1; 

for  k  =  i:r-l 

p  =  k+l:m;  q  =  k:k+l;  [c  s  t]  =  rot (R(q, k) ) ; 

R(q/k)  =  [t;0]  ;  G  =  [c  -s ' ;s  c ' ]  ; 

Q(:,q)  =  Q(:,q)*G; 

if  k  <  m  R(q,p)  =  G'*R(q,p);  end 

end 

if  r  >  m 

r=r-l;  q=l:r; 

Q  =  Q( : /q) ;  R  =  R(q, : ) ; 

else 

s  =  sgn(R(r,r));  Q(:,r)  =  Q(:,r)*s; 

R(r, :)  =  s'*R(r, :) ; 

end 

%  end  qrucd 


function  [Q,R]  =  qruri (Q , R, a , j ) 

%  [Q  R]  =  qruri(Q,R,a, j)  : 

% 

%  UPDATES  the  factorization  A  =  QR  when  a'  is  INSERTED  as  ROW 
%  j  of  A.  Q  has  orthonormal  columns  and  R  is  upper  trapezoidal 
%  with  at  least  as  many  columns  as  rows. 

%  Copyright  (c)  20  July  1991  by  Bill  Gragg.  All  rights 

%  reserved, 

%  qruri  calls  rot. 

%  begin  qruri 


m 

=  length (a ) ; 

[n  r] 

= 

size (Q) ; 

n 

=  n  +  1  ; 

Q( j :n. 

I 

)  =  [zeros (l,r) ; 

Q(j  :n-l,  : )  ]  ; 

r 

=  r  1  ; 

QC  ,r) 

=  zeros (n, 1)  ; 

Q(j,r)  =  1; 

R 

=  [R;a' ] ; 

for  k  =  l;r-l 

p  =  k+ 1 :  m ; 

q  =  [k  r] ; 

[Cot]  =  rot(R(q,k) 

) 

r 

R(q,k)  =  [t; 

0]  ; 

G  =  [c  -s'  ;s 

C  ]  ; 

Q(:,q)  =  Q(: 

,q)*G; 
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if  k  <  m 


R(q/P)  =  G' *R(q,p) ; 


end 


end 

if  r  >  m 

r=r-l;  q=l:r; 

Q  =  Q( :  /q)  ;  R  =  R(q,  : )  ; 

else 

s  =  sgn(R(r,r));  Q(:,r)  =  Q(:,r)*s; 

R(r, : )  =  s'*R(r, : ) ; 

end 

%  end  qruri 


function  [Q,R,a]  =  qrurd(Q,R,j) 

[Q  R  a]  =  qrurd(Q,R, j ) : 

UPDATES  the  QR  factorization  A  =  QR  when  ROW  j  is  DELETED 
from  A.  Q  has  orthonormal  columns  and  R  is  upper  trapezoidal 
with  at  least  as  many  columns  as  rows,  a'  =  A(j,:)  =  Q(i,:)R 
is  the  deleted  row. 


%  qrurd  calls  qrorth  and  rot. 

%  Copyright  (c)  20  July  1991  by  Bill  Gragg.  All  rights 

%  reserved, 

%  begin  qrurd 

[n  r]  =  size(Q);  [r  m]  =  si2e(R);  t  =  [l:j-l  j  +  l;n]; 
if  r  <  n 

q  =  zeros (n,l);  q(j)=l;  r=r+l; 

Q(:,r)  =  qrorth (Q, q) ;  R(r,:)  =  zeros ( 1, m); 

end 

for  k  =  r-1 : -1 : 1 

P  =  k:m  ;  q  =  [k  r] ; 

[c  s  u]  =  rot (Q( j ,q)); 

Q(j,r)  =  u;  G  =  [-S  c ' ;c  s ' ] ; 

Q(t,q)  =  Q(t,q) *G; 

R(q,p)  =  G' *R(q,p) ; 

end 

r  =  r  -  1;  q  =  l:r;  Q  =  Q(t,q); 

a  =  R  ( r+ 1 ,  :  )  '  ;  R  =  R  ( q ,  : )  ; 

d  =  diag(R) ;  d  =  sgn(d) ;  D  =  diag(d) ;  Q  =  Q*D; 

R  =  D'*R; 

%  end  qrurd 


function  [q,r,s,k]  =  qrorth(Q,a) 

%  [q  r  c  k]  =  qrorth(Q,a)  or  q  =  qrorth(Q); 
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% 

%  ORTHONORMALIZES  column  a  against  the  columns  of  Q  using 
REORTHOGONALIZATIDN .  It  is  assumed  that  Q  has  (nearly) 
orthonormal  columns.  Let  [n  m]  =  size(Q).  An  error  message 
occurs  if  m  >  n.  For  m  <=  m  we  have 

[Q  a]  =  [Q  g][I  r] 

[  s]  with  Q'g  =0,  s  >=  C 

and,  if  m  <  n, 

q'q  =  1. 

If  m  =  n  we  take  q  :=  0  and  s  ;=  0.  If  a  is  not  input  we 
take  a  :=  0.  For  m  <  n  and  a  =  0  we  get  a  unit  vector  q  in 
the  orthogonal  complement  of  the  range  of  Q. 


%  qrorth  calls  no  extrinsic  functions. 

%  Copyright  (c)  20  July  1991  by  Bill  Gragg.  All  rights 

%  reserved, 

%  begin  qrorth 

[n  m]  =  size (Q) ; 

if  nargin  <2  a  =  zeros(n,l);  end 
if  m  >  n  error('Q  has  too  many  columns.'),  end 
if  m  ==  n  q  =  zeros(n,l);  r  =  Q'*a;  s  =  0;  k  =  0; 
return,  end 

norma  =  norm (a) ;  t  =  norma/2; 

r  =  Q'*a;  b  =  a  -  Q*r;  s  =  norm(b) ;  k  =  1; 
if  norma  ==  0  zflag  =1;  k  =  0;  end 
while  0  ==  0 

if  s  >  t  q  =  b/s;  break,  end 
if  k  >  4  error (' Process  did  not  terminate  in  5 
iterations.'),  end 

if  s  <=  t*eps/10 

[u  j]  =  min (norms (Q ')) ; 
if  s  ==  0 

s  =  eps/4 ;  b(j)  =  s; 
else 

b(j)  =  b(j)  +  s*eps/4; 

end 

norma  =  s ;  k  =  k  +  1/2; 

end 

t  =  s/2;  b  =  b  -  Q*(Q'*b);  s  =  norm(b);  k  =  k  +  1; 
end 

end 

if  zflag  r  =  zeros(m,l);  s  =  0;  end 
%  end  qrorth 
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function  [c,s,r]  =  rot(x,y) 

[c  s  r]  =  rot(x,y)  or  [c  s  r]  =  rot(z); 

Carefully  computes  the  Gram-Schmidt  QR  factorization 

z  :=  [X]  =  [c]  r 

[y]  [s] 

Note  that 

[X]  =  [c  -s' ]  [r]  =:  QR 

[y]  [s  c']  [0] 

is  a  full  QR  factorization.  This  is  a  "tool  of  the  trade"  in 
computational  linear  algebra.  Note  that  Q  and  Q'  are 
ROTATIONS  and  that 

[  C  s']  [X]  =  [r] 

[-S  c  ]  [y]  [0]  . 

Copyright  (c)  28  October  1990  by  Bill  Gragg.  All  rights 
reserved . 

rot  calls  no  extrinsic  functions. 

begin  rot 

if  nargin  <  2 
c  =  sign(x) ; 
if  y  >  0 

if  X  >  y 
t  =  y/x; 
u  =  1/u; 
else 

t  =  x/y; 
u  =  t/u; 

end 
else 

r  =  x;  if  r  ==  0  c  =  1;  end 

end 

end  rot 
Total  flops: 

Real  case:  1  sqrt  +  5  mults  +  1  add. 

Complex  case:  1  sqrt  +  11  mults  +  3  adds. 


y  =  X ( 2 ) ;  X  =  X ( 1 ) ;  end 
s  =  sign(y) ;  x  =  abs(x);  y  =  abs(y); 


u  =  sqrt(l  +  t*t) ;  r  =  x*u;  v  =  t/u; 
c  =  u*c;  s  =  v*s; 

u  =  sqrt(l  +  t*t)  ;  r  =  y*u;  v  =  1/u; 
c  =  u*c;  s  =  v*s; 
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function  [x,r]  =  qrmgsls (Q , R, b) 

%  [X  r]  =  qrmgsls(Q,R,b) : 

Solves  the  norin(b  -  Ax)  =  minimum  for  x  given  the  QR 
factorization  A  =  QR  and  computes  the  minimal  norm  rnorm 
using  the  MODIFIED  GRAM-SCHMIDT  process  (mgs) . 

qrmgss  calls  gfsb. 

begin  qrmgsls  ^ 

Compute  the  "Fourier  coefficients"  c  =  Q'b  and  the  residual 
vector  r=b-Ax=b-  QRx  =  b  -  Qc  =  (I  -  QQ')b,  WITHOUT 
COMPUTING  X.  ft 

r  =  b; 

for  i  =  l:cols(Q) 

q  =  Q(:,i);  t  =  q'*r;  c  =  [c;t]; 

end 

%  Compute  r  and  backsolve  for  x. 

x=gfsb(R,c);  r=b-  Q*c; 

%  end  qrmgsls 
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function  [Q , R, Pi , rank, W, delta]  =  rrqr(A,tol) 

%  R  =  rrqr(A,tol) 

%  [Q, R, Pi, rank, W, delta]  =  rrqr(A,tol) 

% 

%  Compute  a  rank  revealing  QR  factorization  of  A: 

%  A*Pi  =  Q*R  =  Q  [  R_ll  R_12  ]  , 

%  [  0  R_22  ] 

%  such  that  Q  is  m-by-n,  R  is  triangular  n-by-n,  and 
%  1)  R_ll  is  rank-by-rank  and  well-conditioned, 

%  2)  rank  =  numerical  rank  of  A  with  respect  to  tol , 

%  defined  as  the  number  of  singular  values  greater  than 

%  tol , 

%  3)  norm(R_22)  is  of  the  same  order  of  magnitude  as  the 

%  (rank+1) 'th  singular  value. 

% 

%  Also,  return  a  matrix  W  whose  columns  are  orthonormal  and 
%  span  an  aproximation  to  the  null  "N"  space  of  A,  and  the  S 
%  delta (rank: nl : 2 )  containing  lower  and  upper  bounds  for  the 
%  last  n-rank+1  singular  values  of  A  (the  first  rank-1  rows  of 
%  delta  are  zero) . 

% 

%  If  no  tol  is  specified,  sqrt (n) *norm (A, 1) *eps  is  default. 

% 

%  This  program  is  an  implementation  of  the  algorithm  described 
%  in  the  paper:  T.  Chan,  "Rank  Revealing  OR  factorizations". 
%  Lin.  Alg.  Appl.  88/89  (1987),  67-82. 

%  The  use  of  the  factor  c_max_ratio  was  suggested  in:  C.  H. 
%  Bischof,  &  P.  C.  Hansen,  "Structure  preserving  and  rank 
%  revealing  QR-f  actf'  izat ions" ,  SISSC,  to  appear 

[m,n]  =  size(A);  d«.  .ta  =  zeros(n,2);  c_max_ratio  =  10; 
if  (nargin==l) ,  tol  =  sqrt (n) *norm (A, 1 ) *eps ;  end 
if  (nargout>4),  W  =  [];  end 

%  Compute  an  initial  QR  factorization  A*Pi  =  Q*R. 

[Q,R,Pi]  =  qr(A);  Q  =  Q(:,l:n);  R  =  R(l:n,:); 

%  Prepare  for  the  iterations.  Estimate  smalle..  .  singular 
%  value  of  R, 
nu  =  0 ;  i  =  n ; 

[sest,v]  =  ccvl(R);  if  (nargout>5) ,  delta (n,l)  =  sest;  end 

%  Loop  until  a  singular  value  estimate  larger  than  tol  is 
%  found. 

while  sest  <  tol,  nu  =  nu+1; 

%  Update  the  matrix  W. 
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if  (nargout>4) ,  W  =  [ [v;zeros (nu-1 , 1) ] , W] ;  end 

%  Find  the  element  in  v  with  greatest  index,  numerically 
%  within  a  factor  cmaxratio  of  the  numerically  largest 
%  element. 

vmax  =  norm(v,inf); 
for  k=i:-l:l 

if  (vmax  <=  abs (v (k) ) *c_max_ratio) ,  break,  end 
end 

If  necessary,  generate  the  permutation  that  brings  the  pivot  , 

element  of  v  to  the  last  position,  apply  the  permutation  to 
W,  Pi,  and  R,  compute  a  new  QR  factorization  of  R(l:i,l:i), 
and  update  Q  and  R.  » 

if  (k<i) 

p  =  [k+l:i,k];  Pi(:,k:i)  =  Pi(:,p);  R(:,k:i)  =  R(:,p); 
if  (nargout>4),  W(k:i,:)  =  W(p,:);  end 
for  j=k:i-l 

[c,s]  =  gen_g(R(j , j :n) ,R(j  +  l, j :n)  )  ; 

[R(j  ,  j  :n)  ,R(j  +  l,  j  :n)  ]  = 
app_g_left(c,s,R( j , j :n) ,R( j+1, j :n) ) ; 

R(j+l,j)  =  O; 

[Q(: I j) .Q(: , j+1) ]  =  app_g_right(c,s,Q( : , j) ,Q( : , j+l) ) ; 
end 
end 

Provide  an  upper  bound  for  the  i'th  singular  value, 
if  (nargout>5) ,  delta (i, 2)  =  norm(R(i:n, i:n) ) ;  end 

Estimate  the  smallest  singular  value  of  R ( 1 : i-1 , 1 : i-1) , 
which  is  a  lower  bound  for  the  (i-l)'th  singular  value, 
i  =  i-1;  [sest,v]  =  ccvl (R(l: i, l: i) ) ; 
if  (nargout>5),  delta(i,l)  =  sest;  end 

end 

Finish  the  computation.  If  nargout  <  2,  return  R. 
if  (nargout>5) ,  delta(i,2)  =  norm (R (rank: n , rank : n) ) ;  end 
rank  =  n  -  nu;  if  (nargout>4) ,  W  =  Pi*W;  end 
if  (nargout  <  2),  Q  =  R;  end 

This  algorithm  is  described  by  T.  Chan  &  P.  Hansen  [Ref  15]  .  \ 


function  [x,r]  =  basic (Q, R, Pi , b, rank) 

[X  r]  =  basic(Q,R,Pi,b,rank) 

Compute  the  basic  solution.  If  the  RRQR  of  A  is 
A*Pi  =  Q*R  =  Q  [  R_ll  R_12  ]  , 

[  0  R_22  ] 
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%  where  A_ll  is  rank-by-rank,  then  the  basic  solution  is 
X  =  Pi*[  inv(R_ll)  0  ]*Q'*b  . 

[  0  0  ] 

Ref.:  G.  H.  Golub  &  C.  F.  Van  Loan,  "Matrix  Computations", 
Johns  Hopkins,  1989.  Subsection  5.5.6. 

%  Per  Christian  Hansen,  UNI-C,  07/11/90. 

X  =  Pi( : , irrank) * (R(l:rank,l:rank) \(Q( : , l:rank) '*b) ) ; 
r  =  b  -  Q*R*Pi*x; 


85 


LIST  OF  REFERENCES 


1.  Rawlings,  J.  O. ,  Applied  Regression  Analysis:  A  Research 
Tool .  Wiley  &  Sons,  1988. 

2.  Seber,  G.  A.,  Linear  Regression  Analysis.  John  Wiley  & 
Sons,  1977. 

3.  Mallows,  C.L.,  Data  Analysis  In  a  Regression  Context. 
Proceedings  of  University  of  Kentucky  Conference  on 
Regression  With  a  Large  number  of  Predictor  Variables, 
Department  of  Statistics,  University  of  Kentucky. 

4.  Hocking,  R.  R. ,  "The  analysis  and  selection  of  variables 
in  linear  regression".  Biometrics .  vol.  32,  pp  17-22, 
1973  . 

5.  Chambers  J.  M. ,  "Regression  Updating",  Journal  of  the 
American  Statistical  Association,  vol.  66,  pp  744-748, 
1971. 

6.  Efroymson,  M.  A.,  Multiple  Regression  Analysis. 

Mathematical  Methods  for  Digital  Computers,  vol.  I,  edit. 
Anthony  Ralston  and  Herbert  S.  Wilf,  New  York:  John  Wiley 
&  Sons,  pp  191-203,  1960. 

7.  Draper,  N.  R.  ,  and  Smith,  H. ,  Applied  Regression  Analysis. 
New  York:  John  Wiley  &  Sons,  1966. 

8.  Elden,  L.  ,  Stepwise  Regression  Analysis  with  Orthogonal 
Transformations  .  Linkoping  University  Report  MAT-R-1972-2  , 
Linkoping,  Sweden,  1972. 

9.  Golub,  G.,  "Numerical  Methods  for  Solving  Linear  Least 
Squares  Problems",  Numerische  Mathematik.  vol.  7,  pp  206- 
216,  1965. 

10.  Gragg,  W.  B.  ,  Leveque,  R.  J.,  Trangenstein ,  J.  A., 
Numerically  Stable  Methods  for  Updating  Regressions, 
Journal  of  the  American  Statistical  Association,  vol  74, 
pp  161-168,  March  1979. 

11.  Foster,  V.  L.  ,  "Rank  and  Null  Space  Calculations  Using 
Matrix  Decomposition  Without  Column  Interchanges",  Linear 
Algebra  Applications,  vol  74,  pp  47-71,  1986. 


86 


r 


12.  Golub,  H.  G.,  Klema,  V.,  Stewart,  W.  G, ,  Rank  Degeneracy 
And  Least  Squares  Problems.  Technical  Report  STAN-CS-76- 
559,  Computer  Science  Department,  Stanford  University, 
1976. 

13.  Chan,  F.  T. ,  "Rank  Revealing  QR  Factorizations",  Linear 
Algebra  and  its  Applications,  vol  88/89,  pp  67-82,  1987. 

14.  Karasalo,  I.,  "A  Criterion  For  Truncation  of  the  QR 
Decomposition  Algorithm  for  the  Singular  Linear  Least 
Squares  Problem",  BIT .  vol  14,  pp  156-166,  1974. 

15.  Chan,  T.  F.,  Hansen,  P.  C.  ,  "Some  Applications  of  the  Rank 
Revealing  QR  Factorization",  CAM  Report  90-09.  Department 
of  Mathematics,  UCLA. 

16.  Hager,  W.  W. ,  Applied  Numerical  Linear  Algebra.  Prentice 
Hall,  1988. 

17.  Chan,  T.  F.,  "On  the  Existence  and  Computation  of  LU- 
Factorizations  with  Small  Pivots",  Mathematics  of 
Computation .  vol.  42,  pp  738-754,  August  1984. 

18.  Technical  Support  Package,  Algorithm  for  Solution  of 
Subset-Regression  Problem,  NASA  Tech  Briefs.  ARC-12145. 

19.  Daniel,  W.  J.,  Gragg,  W.  B. ,  Kaufman,  L. ,  Stewart,  G.  W. , 
"Reorthogonalization  and  Stable  Algorithms  for  Updating 
the  Gram-Schmidt  QR  Factorization",  Mathematics  of 
Computations .  vol.  30,  pp  772-795,  October  1976. 


87 


